JOURNAL OF SOFTWARE (JSW)
ISSN : 1796-217X
Volume : 2    Issue : 4    Date : October 2007

New Functions for Unsupervised Asymmetrical Paraphrase Detection
Cordeiro João, Dias Gaël, and Brazdil Pavel
Page(s): 12-23
Full Text:
PDF (630 KB)


Abstract
Monolingual text-to-text generation is an emerging research area in Natural Language Processing.
One reason for the interest in such generation systems is the possibility to automatically learn
text-to-text generation strategies from aligned monolingual corpora. In this context, paraphrase
detection can be seen as the task of aligning sentences that convey the same information but yet
are written in different forms, thereby building a training set of rewriting examples. In this paper, we
propose a new type of mathematical functions for unsupervised detection of paraphrases, and test
it over a set of standard paraphrase corpora. The results are promising as they outperform stateof-
the-art functions developed for similar tasks. We consider two types of paraphrases - symmetrical
and asymmetrical entailed - and show that although our proposed functions were conceived and
oriented toward the asymmetrical detection, they perform rather well for symmetrical sentence pairs
identification.

Index Terms
Paraphrasing, Paraphrase Identification, Sentence compression, Text Summarization, Text
Generation, Textual Entailment, Text Mining.