Authors
Amanda Stent, Matthew Marge, Mohit Singhai
Publication date
2005/2/13
Book
International conference on intelligent text processing and computational linguistics
Pages
341-351
Publisher
Springer Berlin Heidelberg
Description
Recent years have seen increasing interest in automatic metrics for the evaluation of generation systems. When a system can generate syntactic variation, automatic evaluation becomes more difficult. In this paper, we compare the performance of several automatic evaluation metrics using a corpus of automatically generated paraphrases. We show that these evaluation metrics can at least partially measure adequacy (similarity in meaning), but are not good measures of fluency (syntactic correctness). We make several proposals for improving the evaluation of generation systems that produce variation.
Total citations
20052006200720082009201020112012201320142015201620172018201920202021202220232024147371222235111416111518237
Scholar articles
A Stent, M Marge, M Singhai - International conference on intelligent text processing …, 2005