Can Large Language Models Predict the Outcome of Judicial Decisions?

2025-01-15Code Available0· sign in to hype

Mohamed Bayan Kmainasi, Ali Ezzat Shahroor, Amani Al-Ghraibah

Code Available — Be the first to reproduce this paper.

Code

github.com/MohamedBayan/Arabic-Legal-Judgment-Prediction
Officialpytorch★ 2

Abstract

Large Language Models (LLMs) have shown exceptional capabilities in Natural Language Processing (NLP) across diverse domains. However, their application in specialized tasks such as Legal Judgment Prediction (LJP) for low-resource languages like Arabic remains underexplored. In this work, we address this gap by developing an Arabic LJP dataset, collected and preprocessed from Saudi commercial court judgments. We benchmark state-of-the-art open-source LLMs, including LLaMA-3.2-3B and LLaMA-3.1-8B, under varying configurations such as zero-shot, one-shot, and fine-tuning using QLoRA. Additionally, we used a comprehensive evaluation framework combining quantitative metrics (BLEU and ROUGE) and qualitative assessments (Coherence, legal language, clarity). Our results demonstrate that fine-tuned smaller models achieve comparable performance to larger models in task-specific contexts while offering significant resource efficiency. Furthermore, we investigate the effects of prompt engineering and fine-tuning on model outputs, providing insights into performance variability and instruction sensitivity. By making the dataset, implementation code, and models publicly available, we establish a robust foundation for future research in Arabic legal NLP.

Tasks

Prompt Engineering

Can Large Language Models Predict the Outcome of Judicial Decisions?

Code

Abstract

Tasks

Reproductions