View article

[PDF] from aaai.org

Corpus-based and knowledge-based measures of text semantic similarity

Authors

Rada Mihalcea, Courtney Corley, Carlo Strapparava

Publication date

2006/7/16

Conference

Aaai

Volume

Issue

2006

Pages

775-780

Description

This paper presents a method for measuring the semantic similarity of texts, using corpus-based and knowledge-based measures of similarity. Previous work on this problem has focused mainly on either large documents (eg text classification, information retrieval) or individual words (eg synonymy tests). Given that a large fraction of the information available today, on the Web and elsewhere, consists of short text snippets (eg abstracts of scientific documents, imagine captions, product descriptions), in this paper we focus on measuring the semantic similarity of short texts. Through experiments performed on a paraphrase data set, we show that the semantic similarity method outperforms methods based on simple lexical matching, resulting in up to 13% error rate reduction with respect to the traditional vector-based similarity metric.

Total citations

Cited by 1802

20062007200820092010201120122013201420152016201720182019202020212022202320245 21 28 37 49 54 90 127 151 152 147 155 155 180 155 111 82 63 18

Scholar articles

Corpus-based and knowledge-based measures of text semantic similarity

R Mihalcea, C Corley, C Strapparava - Aaai, 2006

Cited by 1802 Related articles All 14 versions