Authors
Fernando Pereira, Naftali Tishby, Lillian Lee
Publication date
1994/8/22
Journal
arXiv preprint cmp-lg/9408011
Description
We describe and experimentally evaluate a method for automatically clustering words according to their distribution in particular syntactic contexts. Deterministic annealing is used to find lowest distortion sets of clusters. As the annealing parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical ``soft'' clustering of the data. Clusters are used as the basis for class models of word coocurrence, and the models evaluated with respect to held-out test data.
Total citations
1993199419951996199719981999200020012002200320042005200620072008200920102011201220132014201520162017201820192020202120222023202481421253850424058736478918482807990545864405939232520141514103
Scholar articles
F Pereira, N Tishby, L Lee - arXiv preprint cmp-lg/9408011, 1994