Despite some improvements in compliance metrics after the implementation of the European General Data Protection Regulation (GDPR), privacy policies have become longer and more ambiguous. They often fail to fully meet GDPR requirements, thus leaving users without a reliable way to understand how their data is processed. We present a novel corpus composed by 30 privacy policies of online platforms and a new set of annotation guidelines, to assess the level of comprehensiveness of information. We focus on the processed categories of data, classifying each clause either as fully informative or as insufficiently informative. In our experimental evaluation, we perform 6 different classification and detection tasks, comparing BERT models and generative Large Language Models.
Privacy policies often fall short of providing a comprehensive account of how personal data is used, thus failing to comply with GDPR requirements. By doing so, they hamper the users’ ability to make informed decisions about using services while ensuring that their data is used properly and fairly. This calls for automatic tools that can effectively identify potentially unlawful policies. Here we present a new corpus of Italian privacy policies, with clauses labelled by experts in data protection law, to indicate the level of comprehensiveness of information. We focus on the categories of data processed, classifying each clause as either sufficiently or insufficiently informative (“vague”). We perform 6 different classification and detection tasks, comparing the performance of BERT-based models and generative Large Language Models. Addressing multilingualism is crucial in the EU, whose 24 spoken languages are an integral part of its cultural heritage. Consequentely, we also perform cross-language experiments to evaluate whether a pre-existing English corpus or classifiers can be leveraged for Italian and, vice versa, whether our corpus is informative enough to generalize to other languages.
In a world of human-only readers, a trade-off persists between comprehensiveness and comprehensibility: only privacy policies too long to be humanly readable can precisely describe the intended data processing. We argue that this trade-off no longer exists where LLMs are able to extract tailored information from clearly-drafted fully-comprehensive privacy policies. To substantiate this claim, we provide a methodology for drafting comprehensive non-ambiguous privacy policies and for querying them using LLMs prompts. Our methodology is tested with an experiment aimed at determining to what extent GPT-4 and Llama2 are able to answer questions regarding the content of privacy policies designed in the format we propose. We further support this claim by analyzing real privacy policies in the chosen market sectors through two experiments (one with legal experts, and another by using LLMs). Based on the success of our experiments, we submit that data protection law should change: it must require controllers to provide clearly drafted, fully comprehensive privacy policies from which data subjects and other actors can extract the needed information, with the help of LLMs.
Most of the existing natural language processing systems for legal texts are developed for the English language. Nevertheless, there are several application domains where multiple versions of the same documents are provided in different languages, especially inside the European Union. One notable example is given by Terms of Service (ToS). In this paper, we compare different approaches to the task of detecting potential unfair clauses in ToS across multiple languages. In particular, after developing an annotated corpus and a machine learning classifier for English, we consider and compare several strategies to extend the system to other languages: building a novel corpus and training a novel machine learning system for each language, from scratch; projecting annotations across documents in different languages, to avoid the creation of novel corpora; translating training documents while keeping the original annotations; translating queries at prediction time and relying on the English system only. An extended experimental evaluation conducted on a large, original dataset indicates that the time-consuming task of re-building a novel annotated corpus for each language can often be avoided with no significant degradation in terms of performance.
Further publications:
Markus Reuter, Tobias Lingenberg, Ruta Liepina, Francesca Lagioia, Marco Lippi, Giovanni Sartor, Andrea Passerini, and Burcu Sayin. 2025. Towards Reliable Retrieval in RAG Systems for Large Legal Datasets. In Proceedings of the Natural Legal Language Processing Workshop 2025, pages 17–30, Suzhou, China. Association for Computational Linguistics. [conference proceedings]. Brief description: The paper addressed the reliability of Retrieval-Augmented Generation (RAG) systems in the legal domain, highlighting the challenges associated with the retrieval step in large and structurally similar document collections. It identified and quantified a critical failure mode, termed Document-Level Retrieval Mismatch (DRM), in which information was retrieved from incorrect source documents. To mitigate this issue, the authors proposed a technique called Summary-Augmented Chunking (SAC), which enriched text chunks with document-level summaries to preserve global context. The experimental evaluation demonstrated that this approach significantly reduced retrieval errors and improved precision and recall, thereby enhancing the overall reliability of RAG systems for legal applications.
Marco Panarelli, Andrea Galassi, Francesca Lagioia, Rūta Liepiņa, Marco Lippi, Przemysław Pałka, and Giovanni Sartor. 2026. Is It Worth Using LLMs for Unfair Clause Detection in Terms of Service? In Proceedings of the Twentieth International Conference on Artificial Intelligence and Law (ICAIL '25). Association for Computing Machinery, New York, NY, USA, 139–149. [conference proceedings]. Brief description: The paper examined the use of Large Language Models for the detection of unfair clauses in Terms of Service, a task of significant relevance for consumer protection. It compared different prompting strategies for LLMs with traditional fine-tuned BERT-based models. Through an extensive experimental evaluation, the study investigated whether LLMs provided advantages over established approaches in this domain-specific task. The results contributed to assessing the effectiveness and practical value of LLMs for automated unfair clause detection.
Liepiņa, R., Lagioia, F., Lippi, M., Pałka, P., Micklitz, H. W., & Sartor, G. (2025). Automating legal tasks: LLMs, legal documents, and the AI Act. Brief description: The chapter examined the impact of the shift from predictive to generative AI on the legal domain, with particular focus on the use of Large Language Models. It analyzed the opportunities and challenges associated with integrating LLMs into legal research and practice, with specific reference to the CLAUDETTE system and its implications for consumer empowerment and privacy protection. The study also explored emerging legal issues in light of the AI Act and related regulatory frameworks. It emphasized the importance of understanding the capabilities and limitations of LLMs in comparison with traditional approaches for legal applications.
Pałka, P., Lippi, M., Lagioia, F., Liepiņa, R., & Sartor, G. (2023). No more trade-offs. GPT and fully informative privacy policies. arXiv preprint arXiv:2402.00013: The paper reports the results of an experiment aimed at testing to what extent ChatGPT 3.5 and 4 is able to answer questions regarding privacy policies designed in the new format that we propose. In a world of human-only interpreters, there was a trade-off between comprehensiveness and comprehensibility of privacy policies, leading to the actual policies not containing enough information for users to learn anything meaningful. Having shown that GPT performs relatively well with the new format, we provide experimental evidence supporting our policy suggestion, namely that the law should require fully comprehensive privacy policies, even if this means they become less concise.
G. Resta, Health Data in Europe. At the crossroads of Data Protection and Data Sharing, in Anders-Catanzariti-Incardona-Resta (eds.), Data privacy, data property and data sharing, Routledge, 2025, 211-224: an analysis of the Common European Health Data Space Regulation, also with regard to wellness apps and secondary uses of health data sharing;
G. Resta, “Health data in Europe: from data protection to data sharing” (Paper presented at the conference “AI and health: regulatory challenges in Europe and Brazil”, Roma Tre University, 13-2-2025)
G. Resta, Comment to Art.2, c.4, in A. Mantelero-G. Resta-G. Riccio, Intelligenza Artificiale. Commentario, Wolters-Kluwer, 2025: an analysis of the territorial and substantial scope of application of the AI Act
G. Resta – G. Riccio, Il disegno di legge Italiano sull’intelligenza artificiale, in A. Mantelero-G. Resta-G. Riccio, Intelligenza Artificiale. Commentario, Wolters-Kluwer, 2025: an analysis of the Italian law of artificial intelligence;
G. Resta, “The Human Person and Data: Beyond the Individualistic Approach”, in H.-W. Micklitz – G. Vettori (eds.), London: Bloomsbury, 2025, 319-334;
G. Resta, “Autonomous Intelligent Systems: From Illusion of Control to Inescapable Delusion” (con R. Torlone – S. Grumbach), in ArXiv, 2024;
G. Resta, “L’ambito territoriale di applicazione del Regolamento Europeo sull’Intelligenza Artificiale: note critiche”, in Dir. Inf., 2024, 731-744.