TY - GEN
T1 - Summarization of German Court Rulings
AU - Glaser, Ingo
AU - Moser, Sebastian
AU - Matthes, Florian
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics.
PY - 2021
Y1 - 2021
N2 - Historically speaking, the German legal language is widely neglected in NLP research, especially in summarization systems, as most of them are based on English newspaper articles. In this paper, we propose the task of automatic summarization of German court rulings. Due to their complexity and length, it is of critical importance that legal practitioners can quickly identify the content of a verdict and thus be able to decide on the relevance for a given legal case. To tackle this problem, we introduce a new dataset consisting of 100k German judgments with short summaries. Our dataset has the highest compression ratio among the most common summarization datasets. German court rulings contain much structural information, so we create a pre-processing pipeline tailored explicitly to the German legal domain. Additionally, we implement multiple extractive as well as abstractive summarization systems and build a wide variety of baseline models. Our best model achieves a ROUGE-1 score of 30.50. Therefore with this work, we are laying the crucial groundwork for further research on German summarization systems.
AB - Historically speaking, the German legal language is widely neglected in NLP research, especially in summarization systems, as most of them are based on English newspaper articles. In this paper, we propose the task of automatic summarization of German court rulings. Due to their complexity and length, it is of critical importance that legal practitioners can quickly identify the content of a verdict and thus be able to decide on the relevance for a given legal case. To tackle this problem, we introduce a new dataset consisting of 100k German judgments with short summaries. Our dataset has the highest compression ratio among the most common summarization datasets. German court rulings contain much structural information, so we create a pre-processing pipeline tailored explicitly to the German legal domain. Additionally, we implement multiple extractive as well as abstractive summarization systems and build a wide variety of baseline models. Our best model achieves a ROUGE-1 score of 30.50. Therefore with this work, we are laying the crucial groundwork for further research on German summarization systems.
UR - http://www.scopus.com/inward/record.url?scp=85138399562&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85138399562
T3 - Natural Legal Language Processing, NLLP 2021 - Proceedings of the 2021 Workshop
SP - 180
EP - 189
BT - Natural Legal Language Processing, NLLP 2021 - Proceedings of the 2021 Workshop
A2 - Aletras, Nikolaos
A2 - Androutsopoulos, Ion
A2 - Barrett, Leslie
A2 - Goanta, Catalina
A2 - Preotiuc-Pietro, Daniel
PB - Association for Computational Linguistics (ACL)
T2 - 3rd Natural Legal Language Processing, NLLP 2021
Y2 - 10 November 2021
ER -