Authors
Rebecca Hwa, Miles Osborne, Anoop Sarkar, Mark Steedman
Publication date
2003/8/21
Journal
Working Notes of the ICML’03 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining
Description
Corrected co-training (Pierce & Cardie, 2001) and the closely related co-testing (Muslea et al., 2000) are active learning methods which exploit redundant views to reduce the cost of manually creating labeled training data. We extend these methods to statistical parsing algorithms for natural language. Because creating complex parse structures by hand is significantly more timeconsuming than selecting labels from a small set, it may be easier for the human to correct the learner’s partially accurate output rather than generate the complex label from scratch. The goal of our work is to minimize the number of corrections that the annotator must make. To reduce the human effort in correcting machine parsed sentences, we propose a novel approach, which we call one-sided corrected co-training and show that this method requires only a third as many manual annotation decisions as corrected co-training/co-testing to achieve the same improvement in performance.
Total citations
20042005200620072008200920102011201220132014201520162017201820192020202120222023243712974634311321
Scholar articles
R Hwa, M Osborne, A Sarkar, M Steedman - Working Notes of the ICML'03 Workshop on the …, 2003