Paraop

The Paraop corpus and models are available on GitHub.

What is it?

Paraop is a framework for annotating paraphrase operations: the specific changes that occur in a sentence in the production of a paraphrase.

To illustrate, if we have the following pair of sentences:

John gave Mary a book

John gave a book to Mary

Then, the Paraop annotations for this sentence pair would be:

John	gave	Mary	a	book
0	0	3	3	3

John	gave	a	book	to	Mary
0	0	3	3	1	3

The numbers below each word in the example above denote which Paraop operation applies to that word. For instance, 1 represents the addition of a function word, 3 represents a change of order, and 0 represents no change.

Why annotate paraphrase operations?

Applications of paraphrase operation detection include:

Data augmentation
Machine translation
Textual entailment detection
Text summarization and simplification
Plagiarism detection

What resources are available?

The Paraop repository on GitHub contains:

The Paraop corpus
Automatic Paraop classifiers

The Paraop corpus is based on the Extended Typology Paraphrase Corpus (ETPC), which, in turn, is based on the Microsoft Research Paraphrase Corpus (MRPC).

The automatic Paraop classifiers are BERT models fine-tuned on the Paraop corpus.