View article

[PDF] from arxiv.org

Distributional clustering of English words

Authors

Fernando Pereira, Naftali Tishby, Lillian Lee

Publication date

1994/8/22

Journal

arXiv preprint cmp-lg/9408011

Description

We describe and experimentally evaluate a method for automatically clustering words according to their distribution in particular syntactic contexts. Deterministic annealing is used to find lowest distortion sets of clusters. As the annealing parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical ``soft'' clustering of the data. Clusters are used as the basis for class models of word coocurrence, and the models evaluated with respect to held-out test data.

Total citations

Cited by 1476

199319941995199619971998199920002001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320248 14 21 25 38 50 42 40 58 73 64 78 91 84 82 80 79 90 54 58 64 40 59 39 23 25 20 14 15 14 10 3

Scholar articles

Distributional clustering of English words

F Pereira, N Tishby, L Lee - arXiv preprint cmp-lg/9408011, 1994

Cited by 1476 Related articles All 23 versions