View article

Corrected co-training for statistical parsers

Authors

Rebecca Hwa, Miles Osborne, Anoop Sarkar, Mark Steedman

Publication date

2003/8/21

Journal

Working Notes of the ICML’03 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining

Description

Corrected co-training (Pierce & Cardie, 2001) and the closely related co-testing (Muslea et al., 2000) are active learning methods which exploit redundant views to reduce the cost of manually creating labeled training data. We extend these methods to statistical parsing algorithms for natural language. Because creating complex parse structures by hand is significantly more timeconsuming than selecting labels from a small set, it may be easier for the human to correct the learner’s partially accurate output rather than generate the complex label from scratch. The goal of our work is to minimize the number of corrections that the annotator must make. To reduce the human effort in correcting machine parsed sentences, we propose a novel approach, which we call one-sided corrected co-training and show that this method requires only a third as many manual annotation decisions as corrected co-training/co-testing to achieve the same improvement in performance.

Total citations

Cited by 73

200420052006200720082009201020112012201320142015201620172018201920202021202220232 4 3 7 12 9 7 4 6 3 4 3 1 1 3 2 1

Scholar articles

Corrected co-training for statistical parsers

R Hwa, M Osborne, A Sarkar, M Steedman - Working Notes of the ICML'03 Workshop on the …, 2003