Train-O-Matic

Train-O-Matic

We present Train-O-Matic, a language-independent method for generating millions of sense-annotated training instances for virtually all meanings of words ina language’s vocabulary.

Abstract

Annotating large numbers of sentences with senses is the heaviest requirement of current Word Sense Disambiguation. We present Train-O-Matic, a language-independent method for generating millions of sense-annotated training instances for virtually all meanings of words in a language’s vocabulary. The approach is fully automatic: no human intervention is required and the only type of human knowledge used is a WordNet-likeresource. Train-O-Matic achieves consistently state-of-the-art performance across gold standard datasets and languages, while at the same time removing the burden of manual annotation.

References

Train-O-Matic: Supervised Word Sense Disambiguation with No (Manual) Effort
BibTex
Tommaso Pasini and Roberto Navigli
Artificial Intelligence Journal, 2019.

Train-O-Matic: Large-Scale Supervised Word Sense Disambiguation in Multiple Languages without Manual Training Data
BibTex
Tommaso Pasini and Roberto Navigli
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, Denmark, 7-11 September 2017.

Huge Automatically Extracted Training Sets for Multilingual Word Sense Disambiguation
BibTex
Tommaso Pasini, Francesco Maria Elia and Roberto Navigli
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018.

Authors

Tommaso Pasini

PhD Student @ Sapienza

pasini [at] di.uniroma1.it

Roberto Navigli

Full Professor @ Sapienza

navigli [at] di.uniroma1.it

Download Data in 3 Languages

English, Italian And Spanish (450 MB - compressed tar.gz).

Download Data in 6 Languages

English, Italian, Spanish, German, French and Chinese and a richer vocabulary (835 - compressed tar.gz).