Extending thesauri using word embeddings and the intersection method

Jörg Landthaler, Bernhard Waltl, Dominik Huth, Daniel Braun, Florian Matthes, Christoph Stocker, Thomas Geiger

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

In many legal domains, the amount of available and relevant literature is continuously growing. Legal content providers face the challenge to provide their customers relevant and comprehensive content for search queries on large corpora. However, documents written in natural language contain many synonyms and semantically related concepts. Legal content providers usually maintain thesauri to discover more relevant documents in their search engines. Maintaining a high-quality thesaurus is an expensive, difficult and manual task. The word embeddings technology recently gained a lot of attention for building thesauri from large corpora. We report our experiences on the feasibility to extend thesauri based on a large corpus of German tax law with a focus on synonym relations. Using a simple yet powerful new approach, called intersection method, we can significantly improve and facilitate the extension of thesauri.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume2143
StatePublished - 2017
Event2nd Workshop on Automated Semantic Analysis of Information in Legal Texts, ASAIL 2017 - London, United Kingdom
Duration: 16 Jun 2017 → …

Keywords

  • Intersection method
  • Parameter study
  • Synsets
  • Tax law
  • Thesaurus
  • Word embeddings
  • Word2vec

Fingerprint

Dive into the research topics of 'Extending thesauri using word embeddings and the intersection method'. Together they form a unique fingerprint.

Cite this