View article

[PDF] from arxiv.org

A universal part-of-speech tagset

Authors

Slav Petrov, Dipanjan Das, Ryan McDonald

Publication date

2011/4/11

Journal

arXiv preprint arXiv:1104.2086

Description

To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via two experiments, including one that reports competitive accuracies for unsupervised grammar induction without gold standard part-of-speech tags.

Total citations

Cited by 1299

2011201220132014201520162017201820192020202120222023202410 35 51 103 119 168 129 142 113 113 112 90 82 25

Scholar articles

A universal part-of-speech tagset

S Petrov, D Das, R McDonald - arXiv preprint arXiv:1104.2086, 2011

Cited by 1299 Related articles All 13 versions