Research Topics

Research areas for theses and project activities to be carried out with our group

Index

  • Argumentation Mining
  • Legal Analytics
  • Knowledge Graphs and LLMs
  • Unstructured Knowledge Integration
  • Interpretability
  • Hybrid Systems and Neuro-Symbolic Approaches

Argumentation Mining

One of our main research interests is Argument/Argumentation Mining (AM). It can be informally described as the problem of automatically detecting and extracting arguments from the text. Arguments are usually represented as a combination of a premise (a fact) that supports a subjective conclusion (opinion, claim). Argumentation Mining touches a wide variety of well-known NLP tasks, spanning from sentiment analysis, stance detection to summarization and dialogue systems.

 

Multimodal Argument Mining

  • Description: Make use of speech information (e.g. prosody) to enhance the set of features that can be used to detect arguments. Speech can either be represented by means of ad-hoc feature extraction methods (e.g. MFCC) or via end-to-end architectures. Few existing corpora both offer argument annotation layers and speech data regarding a given text document.
  • Contact: Eleonora Mancini (e.mancini@unibo.it), Federico Ruggeri (federico.ruggeri6@unibo.it)
  • References
    - Eleonora Mancini, Federico Ruggeri, Stefano Colamonaco, Andrea Zecca, Samuele Marro, Paolo Torroni. 2024. MAMKit: A Comprehensive Multimodal Argument Mining ToolkitIn Proceedings of the 11th Workshop on Argument Mining (ArgMining 2024), pages 69–82, Bangkok, Thailand. Association for Computational Linguistics.
    - Eleonora Mancini, Federico Ruggeri, Paolo Torroni. 2024. Multimodal Fallacy Classification in Political DebatesIn Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers).
    - Eleonora Mancini, Federico Ruggeri, Andrea Galassi, and Paolo Torroni. 2022. Multimodal Argument Mining: A Case Study in Political Debates. In Proceedings of the 9th Workshop on Argument Mining, pages 158–170, Online and in Gyeongju, Republic of Korea. International Conference on Computational Linguistics.

 

Neuro-Symbolic Argumentative Relation

 

Hate Speech Detection with Argumentative Reasoning

Legal Analytics

The domain of legal documents is one of those that would benefit the most from a wide development and application of NLP tools. At the same time, it typically requires a human with a high level of specialization and background knowledge to perform tasks in this context, which are difficult to transfer to an automatic tool.

In this context, we are involved in multiple projects (see CLAUDETTE, ADELE, LAILA, POLINE, PRIMA on the Projects page), which address tasks such as: argument mining, summarization, outcome prediction, detection of unfair clauses, information extraction, and cross-lingual knowledge transfer.

Our purpose is to research and develop tools that can meaningfully impact the community. We are in close contact with teams of legal experts who can provide their expertise, and we have access to reserved datasets that can be used to develop automatic tools.

Different approaches to multilingual legal tools using multilingual representation

  • Description: Recently, we have extended an existing tool for the English language to other languages, such as Italian, German, and Polish. This study covered many alternatives, such as re-training a new tool from scratch, the projection of labels, and automatic translation. The purpose of this project is to explore the use of multi-lingual embedding for this task, comparing different types of embeddings in several scenarios and several languages.
  • Contact: Andrea Galassi (a.galassi@unibo.it), Marco Lippi (marco.lippi@unifi.it)
  • References:
    - Andrea Galassi, Francesca Lagioia, Agnieszka Jabłonowska, Marco Lippi. 2024. 
    Unfair clause detection in terms of service across multiple languages Artificial Intelligence and Law.

Transformers and LLMs for detection and classification of unfair clauses

  • Description:  A few years ago we developed a tool for the automatic detection of unfair clauses in Terms of Services and Privacy Policies documents in English language (CLAUDETTE). Such a tool was developed using technologies that may now be surpassed by more recent technologies. This project aims to develop a new version of the same tool through the use of Transformer-based technologies, LLMs, multiple languages, and more recent datasets.
  • Contact: Andrea Galassi (a.galassi@unibo.it), Marco Lippi (marco.lippi@unifi.it)
  • References
    - Agnieszka Jablonowska, Francesca Lagioia, Marco Lippi, Hans-Wolfgang Micklitz, Giovanni Sartor, Giacomo Tagiuri. 2021.  
    Assessing the Cross-Market Generalization Capability of the CLAUDETTE System. Frontiers in Artificial Intelligence and Applications.

Knowledge Graphs and LLMs

A Knowledge Graph is a graph structure used to represent the Knowledge contained in a Knowledge Base. In this representation, real world entities (e.g. objects, facts, events) are represented as nodes and their relationships as edges.
Knowledge Graphs provide for a compact, usable and human-readable world representation, they are however of discrete nature (hard to work with deep learning). Moreover, KGs are subject to a number of challenges (e.g. entity alignment, ontologies mismatches, etc.) that renders them hard to work with especially during evaluation.
Investingating methods to integrate KGs and LLMs, especially in the field of NLP and from a computational linguistic point of view could potentially enhance LLMs capabilities in lacking fields such as reasoning and maintaining consistency.

 Knowledge Extraction

  • Description: Given a text in natural language extract a Knowledge Graph using Language Models. The key point for this project is to extract relevant information from text and produce a valid (and useful) knowledge base. Open problems: integration with ontologies, new concepts, unknown concepts.
  • Contact: Gianmarco Pappacoda (gianmarco.pappacoda@unibo.it)

 

Knowledge Injection

  • Description: Given a Knowledge Graph and a Language Models, explore methods for enhancing the Language Model's responses with factual knowledge contained in the Knowledge Graph. Possible applications: question answering and information retrieval systems.
  • Contact: Gianmarco Pappacoda (gianmarco.pappacoda@unibo.it)

 

Ontology learning

  • Description: Given a text and a Language Model, learn the corresponding ontology describing entities and relationships.
  • Contact: Gianmarco Pappacoda (gianmarco.pappacoda@unibo.it)

Unstructured Knowledge Integration

We are interested in developing deep learning models that are capable of employing knowledge in the form of natural language. Such knowledge is easy to interpret and to define (compared to structured representations like syntactic trees, knowledge graphs and symbolic rules). Unstructured knowledge increases the interpretability of models and goes in the direction of defining a realistic type of artificial intelligence. However, properly integrating this type of information is particularly challenging due to its inherent ambiguity and variability.

Text Classification with Guidelines Only

  • Description: The standard approach for training a machine learning model on a task is to provide an annotated dataset (X, Y). The dataset is built by providing unlabeled data X to a group of annotators previously trained on a set of annotation guidelines G. Annotators label data X via a given class set C.
    The main issue of this approach is that annotators define the mapping from data X to the class set C via the guidelines G, while machine learning models are trained to learn the same mapping without guidelines G. Consequently, these models can learn any kind of mapping from X to C that better fits given data. Our idea is to directly provide guidelines G to models without any access to class labels during training.
  • Contact: Federico Ruggeri (federico.ruggeri6@unibo.it)
  • References:
    Federico Ruggeri, Eleonora Misino, Arianna Muti, Katerina Korre, Paolo Torroni, Alberto Barrón-Cedeño. 2024. Let Guidelines Guide You: A Prescriptive Guideline-Centered Data Annotation Methodology. ArXiv Pre-print.

Multi-cultural Abusive and Hate Speech Detection

  • Description: What is attributable as abusive or hate speech depends on the given socio-cultural context. The same text might be reputed offensive by a certain culture, allowed by another, and, in the most extreme case, legally prosecutable by a third one. Our aim is to evaluate how machine learning model are affected by different definitions of abusive and hate speech to promote awareness in developing accurate abusive speech detection systems.
  • Contact: Federico Ruggeri (federico.ruggeri6@unibo.it), Katerina Korre (aikaterini.korre2@unibo.it), Arianna Muti (arianna.muti2@unibo.it)
  • References:
    TBA Pre-print

Interpretability

We are interested in developing interpretable models. An interpretable model exposes means for identifying the process that leads from an input to a prediction. We are mainly focused on interpretability by design in text classification.

Current topics of interest:

- Selective RationalizationThe process of learning by providing highlights as explanations is denoted as selective rationalization. Highlights are a subset of input texts meant to be interpretable by a user and faithfully describe the inference process of a classification model. A popular architecture for selective rationalization is the Select-then-Predict Pipeline (SPP): a generator selects the rationale to be fed to a predictor. It has been shown that SPP suffers from local minima derived by sub-optimal interplay between the generator and predictor, a phenomenon known as interlocking.

- Knowledge Extraction: The process of extracting interpretable knowledge from data-driven processes. Our aim is to distill a common knowledge from several examples when addressing a downstream task.

Genetic-based Rationalization

  • Description: We aim to apply recent techniques in genetic-based search to define an interlocking-free SPP model. Genetic-based search breaks the interplay between the generator and predictor: we search for a generator parameter configuration via genetic algorithms and test the SPP model by freezing the generator.
  • Contact: Federico Ruggeri (federico.ruggeri6@unibo.it)
  • References:
    - TBA pre-print.
    - E Herrewijnen, D Nguyen, F Bex, K van Deemter. 2024. Human-annotated rationales and explainable text classification: a survey. Frontiers in Artificial Intelligence.

Mixture of Experts for Rationalization

  • Description: Mixture of Experts (MoE) is a technique whereby several models are trained on the same data, each specializing in a certain subset. MoE have been shown to be successful in a variety of applications and their original formulation dates back early 2000s. The idea is to understand whether we can develop a MoE model for selective rationalization to address interlocking.
  • ContactFederico Ruggeri (federico.ruggeri6@unibo.it)
  • References:
    Weilin Cai, Juyong Jiang, Fan Wang, Jing Tang, Sunghun Kim, Jiayi Huang. 2024. A Survey on Mixture of Experts. ArXiv Pre-print.

Rationalization via LLMs

Structured Rationalization via Tree kernel methods

  • Description: There are several techniques for transforming text into abstract structured content (AMR graphs, Parse trees, etc...). We are interested in applying rationalization in these contexts by also enforcing some structural constraints depending on the given scenario of application. The constraints describe which type of allowed structured the rationalization system can extract. In the case of tree kernels, these structures are different types of trees.
  • Contact: Federico Ruggeri (federico.ruggeri6@unibo.it)
  • References:
    - Federico Ruggeri, Marco Lippi, Paolo Torroni. 2021. Tree-constrained graph neural networks for argument mining. ArXiv Pre-print.

Knowledge Extraction from Rationalization

  • Description: Rationalization is a type of example-specific explanation. However, samples belonging to the same class might share similar rationales. The idea is to define ways to go from a local explanation (i.e., rationalization) to a global explanation (i.e., knowledge base) by aggregating and summarizing extracted rationales. This can be done with LLMs (e.g., prompting techniques) or other solutions.
  • Contact: Federico Ruggeri (federico.ruggeri6@unibo.it)
  • References:
    Shiyu Chang, Yang Zhang, Mo Yu, Tommi S. Jaakkola. 2019. A Game Theoretic Approach to Class-wise Selective Rationalization. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. 

 

Hybrid Systems and Neuro-Symbolic Approaches

We are interested in studying methods and architecture that conjugate symbolic and sub-symbolic approaches, in particular when they involve NLP systems or are applied to the NLP domain.

In-context learning for LLMs to play a board game

  • Description: We aim to study if and to what extent an LLM can learn to play a board game (in terms of rules and strategies) when paired with a symbolic or neural "teacher" model. Several scenarios can be explored, among which using the LLM to explain the decisions of the teacher, training the LLM to evaluate the decisions of its teacher, or even training the LLM to simulate the decision of the teacher. The thesis may be focused on any of the following aspects: exploring several models of LLMs, models of "teacher", training paradigms, etc. A possible board game is Chef's Hat, in which case the thesis may be done with co-supervision with researchers related to that project.
  • Contact: Andrea Galassi (a.galassi@unibo.it)
  • References:
    - Wu Q. et al., AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversations, COLM 2024
    P. Barros, A. Tanevska and A. Sciutti, "Learning from Learners: Adapting Reinforcement Learning Agents to be Competitive in a Card Game," 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 2021, pp. 2716-2723