Project Overview

PRIMA (PRivacy Infringements Machine-Advice) studied the law and practice of privacy policies (PPs) and developed AI-based methods to carry out a trustworthy automated assessment of privacy policy clauses with regard to their compliance with the european data protection law.

This project provided several contributions to the advancement of knowledge:

a) It provided a systematic specification of the legal requirements governing PPs and analysed the links between such policies and consumer contracts.

b) It refined and adapted Legal Analytics (LA) technologies and methods, taking into account the peculiarities of data protection law and the language typically employed in PPs.

c) It contributed to the development of socially beneficial applications of Legal Analytics, providing a solid evidence-based assessment of the opportunities and limitations of AI-based approaches. This work included a review of the capabilities and limitations of current machine learning methods in the analysis of legal documents, the development of legal guidelines for the use of AI methods in detecting unlawful elements in legal texts.

d) It produced empirical studies in the field of data protection, based on the results obtained through the application of LA to PPs. In particular, it examined the linguistic and conceptual structures underlying privacy policies and the relationship between their textual formulation and data processing practices, highlighting major critical issues related to compliance with data protection law.

In this context, PRIMA Project has delivered (i) specification of legal requirements and standards for privacy policies; (ii) new applications of AI methods and tools for the automatic assessment of legal clauses; (iii) a socio-legal studies conducted on empirical-data.

 

PRIMA pursued three main objectives—normative (doctrinal), legal-informatics, and empirical (socio-legal)—which together aimed to advance the understanding, analysis, and assessment of PPs and their compliance with data protection law.

The normative (doctrinal) objective aimed to provide a deeper understanding of the legal requirements governing PPs under the GDPR. Within this objective, the PRIMA project identified best practices and critical issues in the drafting and implementation of PPs and produced a comprehensive report on the requirements for lawful and fair PPs (WP2 “Techno-legal framework and best practices”). This report established the legal benchmark for the socio-legal assessment of privacy practices carried out in the project. Building on this legal analysis, the project defined criteria for assessing the quality of PPs, particularly with regard to the completeness and clarity of the information provided to data subjects (the “gold standard”). These criteria served as the basis for the development of the annotation framework used to construct the PRIMA dataset (WP3 “focus on online platforms” and WP4 “focus on e-health products”). The corresponding annotation guidelines defined the rules for identifying and classifying PP clauses, assessing their compliance with the GDPR, and detecting clauses that were unlawful or potentially unlawful. For each category of legal deficiency, the guidelines established criteria for assigning scores and a method for aggregating these scores into an overall evaluation of the lawfulness and fairness of a PP. In addition, the guidelines specified the key legal and linguistic features required for the development of the computational methods implemented in the project.

The legal-informatics objective (WP3, WP4 and WP5 “Computational methods and prototype development”) aimed to develop a novel approach to the automated analysis, assessment, and improvement of legal documents by testing methods and techniques from natural language processing and machine learning in the domain of data protection policies. In pursuit of this objective, the project created a multilingual tagged corpus of PPs collected across different market sectors.

The corpus was used to experiment with and evaluate several computational approaches, including natural language processing techniques, computational linguistics methods, and supervised and unsupervised machine learning models. The experimental phase enabled the identification of the most suitable methods for detecting, classifying, and evaluating privacy policy clauses, as reported in the experimental results.

The empirical (socio-legal) objective (WP3 and WP4) aimed to expand existing knowledge of the structure, logic, and dynamics of PP practices. Through the analysis of the annotated corpus, the project detected trends in the content and use of PPs across sectors and identified and classified both legal shortcomings and good practices. In particular, the project conducted a quantitative and qualitative analysis of the legal deficiencies of PPs contained in the tagged corpus. Categories of clauses, types of defective clauses, and omissions were systematically counted, aggregated, and evaluated according to different criteria, including the degree of deviation from the gold standard identified in the doctrinal analysis. This quantitative assessment made it possible to identify recurring defective practices and structural weaknesses in privacy policies - part of this analysis was integrated into the online observatory available on the project website. Based on these findings, the project developed recommendations for improving PPs, including examples of alternative lawful clauses and indications of additional information required to meet legal standards. 

Finally, the project units carried out extensive dissemination and outreach activities (WP1 “Coordination, quality assurance and dissemination”). Researchers presented papers based on PRIMA results at several academic conferences and workshops. The project also organised a final conference dedicated to presenting the main outcomes of the research and discussing the potential of AI-based tools to empower data subjects in understanding and evaluating PPs.

A five-month extension to the project was requested and granted for the period from 28 September 2025 to 28 February 2026. This extension allowed the project teams to refine the creation of the PRIMA Dataset, expand the empirical analyses, and further evaluate the performance of advanced language models for the automated identification and classification of privacy policy clauses. All the regional units involved in the project used this additional period to complete the planned research activities and consolidate the results achieved within their respective work packages.

The image presents an overview of the project in a chart form. The top line. Work package 1: coordination, quality assurance and dissemination (responsible partner UNIBO). The bottom level. Work package 2: techno-legal framework and best practices (responsible partner POLITO). The same level. Work package 3: focus on online platforms' service (responsible partner: UNIBO). The same level. Work package 4: focus on e-health products (responsible partner ROMA TRE). Work package 5: computational methods and prototype development (responsible partner UNIBO).
PERT chart of the PRIMA Project