One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial Support

2023-06-15Code Available0· sign in to hype

Ronja Stern, Vishvaksenan Rasiah, Veton Matoshi, Srinanda Brügger Bose, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho, Joel Niklaus

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/vr18ub/court_view_generation
OfficialIn paperpytorch★ 0
github.com/stern5497/doc2docbeirir
OfficialIn papernone★ 0

Abstract

Recent strides in Large Language Models (LLMs) have saturated many Natural Language Processing (NLP) benchmarks, emphasizing the need for more challenging ones to properly assess LLM capabilities. However, domain-specific and multilingual benchmarks are rare because they require in-depth expertise to develop. Still, most public models are trained predominantly on English corpora, while other languages remain understudied, particularly for practical domain-specific NLP tasks. In this work, we introduce a novel NLP benchmark for the legal domain that challenges LLMs in five key dimensions: processing long documents (up to 50K tokens), using domain-specific knowledge (embodied in legal texts), multilingual understanding (covering five languages), multitasking (comprising legal document-to-document Information Retrieval, Court View Generation, Leading Decision Summarization, Citation Extraction, and eight challenging Text Classification tasks) and reasoning (comprising especially Court View Generation, but also the Text Classification tasks). Our benchmark contains diverse datasets from the Swiss legal system, allowing for a comprehensive study of the underlying non-English, inherently multilingual legal system. Despite the large size of our datasets (some with hundreds of thousands of examples), existing publicly available multilingual models struggle with most tasks, even after extensive in-domain pre-training and fine-tuning. We publish all resources (benchmark suite, pre-trained models, code) under permissive open CC BY-SA licenses.

Tasks

Benchmarking Information Retrieval Language Modelling Legal Reasoning text-classification Text Classification

One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial Support

Code

Abstract

Tasks

Reproductions