Giulia Grundler, Piera Santin, Alessia Fidelangeli, Rachele Mignone, Federico Galli, Andrea Galassi, Giuseppe Contissa, Luigi Di Caro and Paolo Torroni, Automated Extraction of Judicial Interpretative Formulas in EU Case Law on VAT, pre-print of the article to be published after the JURIX Conference 2025
Rachele Mignone, Alessia Fidelangeli, Piera Santin, Luigi Di Caro, Building Blocks of Judicial Reasoning: Defining and Extracting Judicial Interpretative Formulas with LLMs, pre-print of the article to be published after the JURIX Conference 2025
The paper addresses the extraction of Judicial Interpretative Formulas (JIFs) in decisions of the Court of Justice of the European Union (CJEU) on Value Added Tax (VAT). European case law includes a significant number of JIFs on this subject, which are crucial for the interpretation of VAT. However, extracting such JIFs manually is effortful, and doing that automatically has not been investigated yet in the VAT domain. Our work proposes the first pipeline method for doing so. We start by defining a set of guidelines for annotating legal texts following a principle definition of JIF. By following such guidelines, we obtain a corpus of 21 expert labeled CJEU decisions. We keep them for validation and testing. For training, we machine-annotate 80 additional decisions using LLMs. Our experiments show that BERT-based architectures trained on such data perform comparably to LLMs.
The Court of Justice of the European Union (CJEU) frequently employs specific interpretative statements from previous rulings as building blocks to construct legal arguments. We define these recurring formulaic statements as “Judicial Interpretative Formulas” (JIFs) and present a novel computational approach for their automated extraction. Our methodology combines legal domain expertise with Large Language Models (LLMs), using an iterative process where expert error analysis refines both the formal definition of JIFs and extraction performance. We evaluated three LLMs (Claude 3.7 Sonnet, DeepSeek-R1, and Gemini 1.5 Pro) on 106 CJEU Value Added Tax judgments from 2006-2024. Claude 3.7 Sonnet with few-shot prompting achieved optimal performance with an F1-score of 0.932, precision of 0.919, and recall of 0.944. This approach serves a dual purpose: it provides empirical evidence of the widespread nature of JIFs and facilitates the creation of an EU case law dataset. This resource is expected to be used by legal scholars for conducting analyses on evolving trends in judicial reasoning across Europe, while simultaneously offering a practical reference for judges to ensure the consistent application of EU law.