Project Topics

Suggestions and research areas for thesis and project activities


You can find a detailed list of available research topics along with our proposed projects.

Each project follows the following template:

*Project Title*

  • Description: brief project description

  • Supervised by: supervisor #1, supervisor #2, etc...

  • Type:

    • "TO BE DEFINED (TBD)": the project describes a very generic problem without any additional detail, thus, allowing ample room for individual proposals.

    • "PARTIALLY DEFINED": the project provides some details about a problem, such as method and available data. However, additional details have to be defined to set up a specific task.

    • "SET UP": the project is almost/entirely well defined.

  • References: resource #1, resource #2, etc...

If you are interested

If one or multiple research projects meet your interest, don't hesitate to get in touch with the corresponding supervisors.

All projects are supervised by prof. Paolo Torroni (

Argumentation Mining

One of our main research interests is Argument/Argumentation Mining (AM). It can be informally described as the problem of automatically detecting and extracting arguments from the text. Arguments are usually represented as a combination of a premise (a fact) that supports a subjective conclusion (opinion, claim). Argumentation Mining touches a wide variety of well-known NLP tasks, spanning from sentiment analysis, stance detection to summarization and dialogue systems.

#1 - Tree-constrained Argument Model

  • Description: We are currently developing a method to extract structured features (i.e., tree fragments) from tree-like textual inputs (parse tree, dependency trees, argument maps, etc...). Our hypothesis is that arguments might have underlying structured patterns. These patterns are integrated into the model at architecture level and can be easily inspected.

  • Supervised by: Federico Ruggeri (, Marco Lippi (

  • Type: SET UP

#2 - Argument Correction/Completion

  • Description: Given an argument, or its main claim, we would like to generate an improved version of the argument itself. The argument can either be generated from scratch or from an initial and partial argument. Improvement is expressed in terms of quality (argument ranking), via models (we use an existing argument classification model and evaluate its prediction confidence) or on task-specific conditioning factors (e.g. sentiment, stance). Additionally, the generated text can be constrained in order to not be too much different from the original input claim.

  • Supervised by: Andrea Galassi (, Federico Ruggeri (


#3 - Multimodal Argument Mining

  • Description: We would like to make use of speech information (e.g. prosody) to enhance the set of features that can be used to detect arguments. Speech can either be represented by means of ad-hoc feature extraction methods (e.g. MFCC) or via end-to-end architectures. Few existing corpora both offer argument annotation layers and speech data regarding a given text document.

  • Supervised by: Eleonora Mancini (, Federico Ruggeri (


#4 - Data augmentation for AM

  • Description: One of the main problems with AM is label imbalance in corpora. This may occur in Argumentative Sentence Detection (ASD) and Link Prediction (LP) where a lot of "negative" examples exist. We want to experiment with data augmentation techniques to create new synthetic samples that can be used to balance the datasets.

  • Supervised by: Andrea Galassi (, Federico Ruggeri (, Marco Lippi (

  • Type: TBD

#5 - Argumentative Chatbots

  • Description: We are studying techniques to mix the effectiveness of NL models with the logic provided by argumentation. Several research directions can be defined, among which: chatbots that provide information after argumentative reasoning, collection of argumentative dialogues, argumentative prediction on dialogical documents.

  • Supervised by: Eleonora Mancini (, Federico Ruggeri (, Andrea Galassi (

  • Type: TBD

#6 - Learning to Quantify Arguments

  • Description: There are contexts where the fine-grained detection and analysis of argumentative components is not required, and instead a macro-scale estimate of the number of components in the text would be sufficient. For example, a retrieval system for news or scientific literature may prioritize documents with many components. We are interested in developing a case study about the quantification of arguments in textual documents, first applying existing libraries and methods, and then developing ad-hoc solutions.

  • Supervised by: Andrea Galassi (


  • References: "LeQua@CLEF2022: Learning to Quantify" (Esuli et al. 2022) for an example of quantification; "AMICA: An Argumentative Search Engine for COVID-19 Literature" (Lippi et al. IJCAI 2022) for a possible application of the approach.

Neural-Symbolic Machine Learning and Neural-Symbolic NLP

Neural-Symbolic techniques aim to combine the efficiency and effectiveness of neural architectures with the advantages of symbolic or relational techniques in terms of the use of prior knowledge, explainability, compliance, and interpretability.

Despite the existence of many NeSy frameworks, few of them are suited to be applied in the NLP domain for various reasons.

We are interested in extending such frameworks to NLP tasks and in applying them to challenging problems, such as Argument Mining, or to other challenging settings.

#1 - Experimental comparison of neuro-symbolic tools

  • Description: Most neuro-symbolic approaches have been independently tested and evaluated on some data sets and benchmarks, but the comparisons between different approaches and across different tasks remain limited. Based on the activities of the TAILOR excellence network, this project has the objective of carrying out an experimental activity that aims to fill this gap.

  • Supervised by: Andrea Galassi (, Marco Lippi (


#2 - Ground-specific Markov Logic Networks

  • Description: Markov Logic Networks (MLNs) are a statistical relational learning paradigm that combines first-order logic and probabilistic graphical models. Ground-specific MLNs extend MLNs by combining them with (deep) neural networks. This project aims to extend an existing implementation of MLNs in order to improve their usability across different tasks.

  • Supervised byAndrea Galassi (, Marco Lippi (

  • Type: SET UP

Legal Analytics

The domain of legal documents is one of those that would benefit the most from a wide development and application of NLP tools. At the same time, it typically requires a human with a high level of specialization and background knowledge to perform tasks in this context, which are difficult to transfer to an automatic tool.

In this context, we are involved in multiple projects (see ADELE and LAILA on the Projects page), which address tasks such as: argument mining, summarization, outcome prediction, and cross-lingual knowledge transfer.

Our purpose is to research and develop tools that can have a meaningful impact on the community. We are in close contact with teams of legal experts that can provide their expertise, and we have access to reserved datasets that can be used to develop automatic tools.

#1 - Judgement Prediction

  • Description: Judgement prediction is the task of predicting the judge's decision concerning a legal instance ruling. These textual documents often follow a particular organization: two parties, A and B, are involved in the instance ruling where A presents one or multiple requests. Facts and motivations concerning each request are reported in order. Eventually, the judge reports their decision for each individual request (either "accept" or "reject"). We aim to define automatic classification models that, given a request, the set of facts and motivations predicts the corresponding judge's decision.

  • Supervised by: Andrea Galassi (, Federico Ruggeri (, Elena Palmieri (


#2 - Neuro-Symbolic Legal Analysis

  • Description: Automatic tools for legal analytics are subject to tight requirements concerning model interpretability. More precisely, legal experts require tools with interpretable results to assess their degree of trustworthiness. We aim to define such tools via neuro-symbolic frameworks.

  • Supervised by: Andrea Galassi (, Federico Ruggeri (, Elena Palmieri (

  • Type: TBD

#3 - Cross-lingual Text Alignment and Label Projection

  • Description: Legal documents usually come in different formats depending on the given national regulations. To avoid costly annotation procedures to label the same document type in different languages, one solution concerns text alignment and label propagation. In particular, given two documents, A and B, where A is written in language L1 and B is written in language L2, the problem involves finding sentence(s) in B that correspond to sentence(s) in A and propagate labels accordingly (from A to B).

  • Supervised by: Andrea Galassi (, Marco Lippi (,


Development of new datasets and linguistic resources, and experiments on them

Modern machine learning techniques have proven capable of learning even very abstract and high-level concepts, but usually with a caveat: they need plenty of data! Even those techniques that allow unsupervised or semi-supervised learning still need accurate and reliable ground truth to validate and test the final models.

For these reasons, the development of corpora and datasets is a fundamental step toward the development of new models and techniques that can address complex tasks.

We are interested in creating and testing new language resources, especially for tasks that require expert knowledge/skills and/or languages other than English. We are also interested in using these resources to develop and/or test new models and techniques.

#1 - Open for proposals

Subjectivity, biases, propaganda for Fact-Checking

We are interested in develop tools that can help with the detection of non-objective content in news for the purpose of fact-checking.

#1 - Subjectivity

  • Description: Subjectivity detection is a task that is potentially useful in practical settings concerning fake news detection or document analysis. We have created a set of annotation guidelines for sentence labeling in textual documents after several discussions between computer scientists and linguistic experts. The aims of this project is to apply these annotation guidelines to create a new corpus, or extend existing annotated corpora for the task of subjectivity detection (i.e., determining if a sentence is subjective or objective with respect to its author point of view), or apply SotA NLP method to existing datasets. Open problems are document classification based on sentence-level annotation and transfer learning across languages

  • Supervised by: Andrea Galassi (, Federico Ruggeri (



#2 - Full pipeline integration

  • Description: A NLP fact-checking pipeline is composed by multiple steps, such as extracting the relevant information, verifying if it was already fact-checked and assessing if the information is worth a fact-check. Various tools have been developed to address each step. The purpose of this project is to develop an end-to-end machine learning model that integrates most of these steps and provide a complete processing of the information.

  • Supervised by: Andrea Galassi (, Federico Ruggeri (


Unstructured Knowledge Integration

We are interested in developing deep learning models that are capable of employing knowledge in the form of natural language. Such knowledge is easy to interpret and to define (compared to structured representations like syntactic trees, knowledge graphs and symbolic rules). Unstructured knowledge increases the interpretability of models and goes in the direction of defining a realistic type of artificial intelligence. However, properly integrating this type of information is particularly challenging due to its inherent ambiguity and variability.

#1 - Scalable Input-Conditioned Knowledge Sampling

  • Description: Correctly using unstructured knowledge (text) is a challenging task. When knowledge increases in size, there's also the problem of efficiently using it while maintaining satisfying performance. As a solution, we look for a way to learn a map between each input example and the corresponding priority distribution over the knowledge: a sampling weight is associated with each element in the knowledge set. Learning this mapping and merging multiple instances together is not trivial.

  • Supervised by: Federico Ruggeri (


#2 - Text Classification and Clustering with Annotation Guidelines only

  • Description: Unstructured knowledge comes in the form of annotation guidelines. The model is trained to imitate the same annotation process of a human annotator by attributing an annotation rule to each text input (e.g., sentence). Advanced similarity and rule comparison methods might be devised to enforce high performance, interpretability, and robustness. The process can be further enhanced by considering dynamic modification of annotation rules to match the data-driven learning process: modify or generate new rules that better fit given data.

  • Supervised by: Federico Ruggeri (

  • Type: TBD

#3 - Conditioned Text Generation via Textual Knowledge

  • Description: The idea is to generate text conditioned on a given set of unstructured knowledge information, i.e., natural language texts. Informally, textual knowledge defines the set of properties that the generation system must attain to. The generation process must make use of such knowledge when generating in order to achieve a desired. Several case studies can be devised under this general perspective. One good example is data augmentation, where generated data must be similar to the existing one (statistics measures can be employed to evaluate synthetic and original data). Complex black-box methods can be employed as well to satisfy strict requirements.

  • Supervised by: Federico Ruggeri (

  • Type: TBD

#4 - Dynamic Textual Knowledge

  • Description: In the context of unstructured knowledge integration, the available knowledge set might not be in the best format possible, i.e., the one that best suits given data. To this end, we would like to investigate the task of knowledge update. A possible formulation is the one where a dedicated model iteratively re-writes available knowledge based on the performance of another model that makes use of such knowledge. The main constraint is that updated knowledge must not be altered too much.

  • Supervised by: Federico Ruggeri (

  • Type: TBD

#5 - Extracting "Logic" Concepts from Text

  • Description: A logic representation of a given problem is a consistent, verifiable, and interpretable format for text encoding. We would like to learn such representation from the text as a structured learning process. The objective is to preserve the content of the input text while being able to reconstruct it.

  • Supervised by: Federico Ruggeri (

  • Type: TBD

Dialogue Systems and Chatbots

Dialogue Systems are a pervasive technology that is getting more and more popular in every aspect of our life.

We are especially interested in analyzing aspects that are usually not addressed by mainstream companies.

Open Projects

NLP on Scientific Paper and Grant Proposals: Automatic Review, Summarization, and more

We are interested in developing new tools that can support researchers in the analysis of large datasets of scientific publications.

This includes two connected branches. On the one hand, we want to automatically extract all the information that we consider relevant from a scientific publication. On the other hand, we want to perform a selection of publications based on their quality, their content, and their relevance with respect to a user query.

#1 - Paper Selection and Information Extraction in Healthcare

  • Description: Given an annotated dataset regarding the use of pediatric drugs, the task concerns excluding the papers that are not relevant and extracting the important information from the remaining ones.

  • Supervised by: Andrea Galassi (

  • Type: TBD

#2 - Paper Comparison

  • Description: Given a dataset with relevant and irrelevant information on a given topic, develop techniques to compare and evaluate the similarity between two scientific papers.

  • Supervised by: Andrea Galassi (

  • Type: TBD