Automated extraction of semantic information from German legal documents

Bernhard Waltl, Jörg Landthaler, Elena Scepankova, Florian Matthes, Thomas Geiger, Christoph Stocker, Christian Schneider

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Based on a collaborative data science environment, and a large document corpus (< 130'000 documents from German tax law) we demonstrate the extraction of semantic information. This paper shows the potential of rule-based text analysis to automatically extract semantic information, such as the year of dispute in cases. Additionally, it demonstrates the extraction of legal definitions in laws and the usage of terms in a defining context. Based on an iterative and interdisciplinary process, legal experts, software engineers, and data scientists evaluate and continuously refine the model used for the computer-supported extraction.

Original languageEnglish
JournalJusletter IT
Issue numberFebruary
StatePublished - 23 Feb 2017


  • LegalData science
  • Semantic analysis
  • Structured information
  • Text mining


Dive into the research topics of 'Automated extraction of semantic information from German legal documents'. Together they form a unique fingerprint.

Cite this