Resources

List of resources (websites, websites, conferences, etc.) that can be helpful for choosing curricular activities

International contests, benchmarks, and challenges

In the context of NLP there are many international challenges held every year.
Developing models and techniques to tackle these challenges (present or past) are all good proposals for curricular activities.

Obviously, we do not expect results that are competitive with the state-of-the-art, but we expect an amount of work appropriate to the activity's credits.

  • Validity and Novelty of Arguments

    Two tasks of ArgMining2022:
    A) A binary classification task along the dimensions of novelty and validity, classifying a conclusion as being valid/novel or not given a textual premise.
    B) A comparison of two conclusions in terms of validity / novelty

  • Key Point Analysis

    Given an input corpus, consisting of a collection of relatively short, opinionated texts focused on a topic of interest, the goal of KPA is to produce a succinct list of the most prominent key-points in the input corpus, along with their relative prevalence.

  • EVALITA

    Suite of tasks in Italian language. Examples: hate speech detection, sentiment analysis, identification of memes, "la ghigliottina" game.

  • CLEF Labs

    The CLEF conference includes many multi-lingual and multi-modal activity proposals.
    Examples: math question answering, prediction of mental health issues, text simplification of scientific topics, retrieval of arguments, fact checking.

  • Competition on Legal Information Extraction and Entailment (COLIEE)

    A set of tasks and challenges regarding NLP and legal documents.
    The intention is to build a community of practice regarding legal information processing and textual entailment, so that the adoption and adaptation of general methods from a variety of fields is considered, and that participants share their approaches, problems, and results.
    The tasks change every year.

  • Validity and Novelty of Arguments

    We divide the task of Validity-Novelty-Prediction into two subtasks.

    Task A: The first task consists of a binary classification task along the dimensions of novelty and validity, classifying a conclusion as being valid/novel or not given a textual premise.
    Task B: The second subtask will consist a comparison of two conclusions in terms of validity / novelty

NLP resources and venues

A list of NLP-related journals and conferences with useful material. It can be used as a reference for literature on a chosen topic, or as a source of inspiration if you are looking for current challenges.

  • CLEF conference

    CLEF is an event that includes a conference on a broad range of issues in the fields of multilingual and multimodal information access evaluation, and a set of labs and workshops designed to test different aspects of mono and cross-language Information retrieval systems.

  • ACL

    The Association for Computational Linguistics is probably the biggest scientific association related to the NLP field. From its portal is possible to access the Anthology, which contains all the papers published in the past, the portal of the main conferences in this field (ACL, NA-ACL, EMNLP, ...), and the portal of two prominent journals (TACL and Computational Linguistics)

  • List of Workshops of the ACL community

    List of all the workshops and similar events in the ACL community in 2022. There are many sub-communities and tasks in the NLP research field, this list may be useful to have a broad perspective of the field and a valuable source of ideas.

  • AILC

    Italian association for computational linguistics. From its portal, it is possible to access to the website and the proceedings of its conference (CLiC-IT).

  • IEEE Transactions on Neural Networks and Learning Systems

    Prominent journal in the domain of neural networks and machine learning in general

Neural-Symbolic NLP

Neural-Symbolic techniques aim to combine the efficiency and effectiveness of neural architectures with the advantages of symbolic or relational techniques in terms of use of prior knowledge, explainability, compliance, interpretability.

Despite the existence of many NeSy frameworks, few of them are suited to be applied in the NLP domain, for various reasons.

Development of new datasets and linguistic resources, and experiments on them

Modern machine learning techniques have proven capable of learning even very abstract and high-level concepts, but usually with a caveat: they need plenty of data! Even those techniques that allow unsupervised or semi-supervised learning, still need accurate and reliable ground truth to validate and test the final models.

For these reasons, the development of corpora and datasets is a fundamental step towards the development of new models and techniques that can address complex tasks.

We are interested in creating and testing new language resources, especially for tasks that require expert knowledge/skills and/or for languages other than English.
We are also interested in using these resources to develop and/or test new models and techniques.

Legal Analytics

The domain of legal documents is one of those that would benefit the most from a wide development and application of NLP tools. At the same time, it typically requires a human a high level of specialization and background knowledge to perform tasks in this context, which are difficult to transfer to an automatic tool.

In this context, we are involved in multiple projects (see ADELE and LAILA on the Projects page), which address tasks such as: argument mining, summarization, outcome prediction, cross-lingual transfer of knowledge.

Our purpose is to research and develop tools that can have a meaningful impact on the community.
We are in close contact with teams of experts that can provide their expertise and we have access to reserved datasets that can be used to develop automatic tools.

Cross-lingual transfer of knowledge

We are interested in investigating methods to extend knowledge that can be gathered in a resource-rich language setting (such as English) to a resource-poor language setting (such as Italian, but not only!).

Among the keywords for this topic there are knowledge transfer, annotation/tag/label projection, injection of knowledge, domain adaptation, and so on.

We have ongoing research in the context of legal texts and the projection of annotation between parallel asymmetrical documents that we want to expand and improve. But we are obviously open to explore other domains.

Unstructured Knowledge Integration

We are interested in developing deep learning models that are capable of employing knowledge in the form of natural language. Such knowledge is easy to interpret and to define (compared to structured representations like syntactic trees, knowledge graphs and symbolic rules). Unstructured knowledge increases the interpretability of models and goes in the direction of defining a realistic type of artificial intelligence. However, properly integrating this type of information is particularly challenging due to its inherent ambiguity and variability.

  • Detecting and explaining unfairness in consumer contracts through memory networks

    Recent work has demonstrated how data-driven AI methods can leverage consumer protection by supporting the automated analysis of legal documents. However, a shortcoming of data-driven approaches is poor explainability. We posit that in this domain useful explanations of classifier outcomes can be provided by resorting to legal rationales. We thus consider several configurations of memory-augmented neural networks where rationales are given a special role in the modeling of context knowledge. Our results show that rationales not only contribute to improve the classification accuracy, but are also able to offer meaningful, natural language explanations of otherwise opaque classifier outcomes.

  • Deep Learning for Detecting and Explaining Unfairness in Consumer Contracts

    Consumer contracts often contain unfair clauses, in apparent violation of the relevant legislation. In this paper we present a new methodology for evaluating such clauses in online Terms of Services. We expand a set of tagged documents (terms of service), with a structured corpus where unfair clauses are liked to a knowledge base of rationales for unfairness, and experiment with machine learning methods on this expanded training set. Our experimental study is based on deep neural networks that aim to combine learning and reasoning tasks, one major example being Memory Networks. Preliminary results show that this approach may not only provide reasons and explanations to the user, but also enhance the automated detection of unfair clauses.

  • MemBERT: Injecting Unstructured Knowledge into BERT

    Transformers changed modern NLP in many ways. However, they can hardly exploit domain knowledge, and like other blackbox models, they lack interpretability. Unfortunately, structured knowledge injection, in the long run, risks to suffer from a knowledge acquisition bottleneck. We thus propose a memory enhancement of transformer models that makes use of unstructured domain knowledge expressed in plain natural language. An experimental evaluation conducted on two challenging NLP tasks demonstrates that our approach yields better performance and model interpretability than baseline transformer-based architectures.

Argumentation Mining

One of our main research interests is Argument/Argumentation Mining (AM). It can be informally described as the problem of automatically detecting and extracting arguments from the text. Arguments are usually represented as a combination of a premise (a fact) that supports a subjective conclusion (opinion, claim).
Argumentation Mining touches a wide variety of well-known NLP tasks, spanning from sentiment analysis, stance detection to summarization and dialogue systems.

  • Tree-constrained Graph Neural Networks for Argument Mining

    We propose a novel architecture for Graph Neural Networks that is inspired by the idea behind Tree Kernels of measuring similarity between trees by taking into account their common substructures, named fragments. By imposing a series of regularization constraints to the learning problem, we exploit a pooling mechanism that incorporates such notion of fragments within the node soft assignment function that produces the embeddings. We present an extensive experimental evaluation on a collection of sentence classification tasks conducted on several argument mining corpora, showing that the proposed approach performs well with respect to state-of-the-art techniques.

  • Argumentative Link Prediction using Residual Networks and Multi-Objective Learning

    Feature-agnostic method to jointly perform different argument mining tasks, with an emphasis on link prediction.

  • MARGOT: A web server for argumentation mining

    Online tool for argument mining.

  • Argument Mining from Speech: Detecting Claims in Political Debates

    The automatic extraction of arguments from text, also known as argument mining, has recently become a hot topic in artificial intelligence. Current research has only focused on linguistic analysis. However, in many domains where communication may be also vocal or visual, paralinguistic features too may contribute to the transmission of the message that arguments intend to convey. For example, in political debates a crucial role is played by speech. The research question we address in this work is whether in such domains one can improve claim detection for argument mining, by employing features from text and speech in combination. To explore this hypothesis, we develop a machine learning classifier and train it on an original dataset based on the 2015 UK political elections debate.

  • Argument Mining Workshop

    Website of the 2021 workshop on argument mining