List of resources (websites, websites, conferences, etc.) that can be helpful for choosing curricular activities
In the context of NLP there are many international challenges held every year.
Developing models and techniques to tackle these challenges (present or past) are all good proposals for curricular activities.
Obviously, we do not expect results that are competitive with the state-of-the-art, but we expect an amount of work appropriate to the activity's credits.
Two tasks of ArgMining2022:
A) A binary classification task along the dimensions of novelty and validity, classifying a conclusion as being valid/novel or not given a textual premise.
B) A comparison of two conclusions in terms of validity / novelty
Given an input corpus, consisting of a collection of relatively short, opinionated texts focused on a topic of interest, the goal of KPA is to produce a succinct list of the most prominent key-points in the input corpus, along with their relative prevalence.
Suite of tasks in Italian language. Examples: hate speech detection, sentiment analysis, identification of memes, "la ghigliottina" game.
The CLEF conference includes many multi-lingual and multi-modal activity proposals.
Examples: math question answering, prediction of mental health issues, text simplification of scientific topics, retrieval of arguments, fact checking.
A set of tasks and challenges regarding NLP and legal documents.
The intention is to build a community of practice regarding legal information processing and textual entailment, so that the adoption and adaptation of general methods from a variety of fields is considered, and that participants share their approaches, problems, and results.
The tasks change every year.
We divide the task of Validity-Novelty-Prediction into two subtasks.
Task A: The first task consists of a binary classification task along the dimensions of novelty and validity, classifying a conclusion as being valid/novel or not given a textual premise.
Task B: The second subtask will consist a comparison of two conclusions in terms of validity / novelty
A list of NLP-related journals and conferences with useful material. It can be used as a reference for literature on a chosen topic, or as a source of inspiration if you are looking for current challenges.
CLEF is an event that includes a conference on a broad range of issues in the fields of multilingual and multimodal information access evaluation, and a set of labs and workshops designed to test different aspects of mono and cross-language Information retrieval systems.
The Association for Computational Linguistics is probably the biggest scientific association related to the NLP field. From its portal is possible to access the Anthology, which contains all the papers published in the past, the portal of the main conferences in this field (ACL, NA-ACL, EMNLP, ...), and the portal of two prominent journals (TACL and Computational Linguistics)
List of all the workshops and similar events in the ACL community in 2022. There are many sub-communities and tasks in the NLP research field, this list may be useful to have a broad perspective of the field and a valuable source of ideas.
Italian association for computational linguistics. From its portal, it is possible to access to the website and the proceedings of its conference (CLiC-IT).
Prominent journal in the domain of neural networks and machine learning in general
Neural-Symbolic techniques aim to combine the efficiency and effectiveness of neural architectures with the advantages of symbolic or relational techniques in terms of use of prior knowledge, explainability, compliance, interpretability.
Despite the existence of many NeSy frameworks, few of them are suited to be applied in the NLP domain, for various reasons.
A declarative framework designed to support NLP tasks
A study on the possible applications of Neural-Symbolic techniques to Argument Mining
Survey about the many existing frameworks and their characteristics
Modern machine learning techniques have proven capable of learning even very abstract and high-level concepts, but usually with a caveat: they need plenty of data! Even those techniques that allow unsupervised or semi-supervised learning, still need accurate and reliable ground truth to validate and test the final models.
For these reasons, the development of corpora and datasets is a fundamental step towards the development of new models and techniques that can address complex tasks.
We are interested in creating and testing new language resources, especially for tasks that require expert knowledge/skills and/or for languages other than English.
We are also interested in using these resources to develop and/or test new models and techniques.
A corpus created by students of the NLP course that as been published at the CLEF conference in September 2021.
We describe the methodology that can be followed to properly develop a corpus.
A web-based tool for annotating documents
The domain of legal documents is one of those that would benefit the most from a wide development and application of NLP tools. At the same time, it typically requires a human a high level of specialization and background knowledge to perform tasks in this context, which are difficult to transfer to an automatic tool.
In this context, we are involved in multiple projects (see ADELE and LAILA on the Projects page), which address tasks such as: argument mining, summarization, outcome prediction, cross-lingual transfer of knowledge.
Our purpose is to research and develop tools that can have a meaningful impact on the community.
We are in close contact with teams of experts that can provide their expertise and we have access to reserved datasets that can be used to develop automatic tools.
A tool we have developed for the automatic recognition of unfair claused in online Terms of Service contracts.
Demo at http://claudette.eui.eu/demo
Prominent journal on this topic
Website of the ICAIL association, where is possible to find news and material related to conferences and initiatives
The Natural Legal Language Processing community website offers plenty of resources among which workshops, video talks, and datasets.
We are interested in investigating methods to extend knowledge that can be gathered in a resource-rich language setting (such as English) to a resource-poor language setting (such as Italian, but not only!).
Among the keywords for this topic there are knowledge transfer, annotation/tag/label projection, injection of knowledge, domain adaptation, and so on.
We have ongoing research in the context of legal texts and the projection of annotation between parallel asymmetrical documents that we want to expand and improve. But we are obviously open to explore other domains.
A study regarding how to project annotation between parallel documents written in different languages.
A new version of the corpus with 4 different languages
A student's thesis on the topic of annotation projection
We are interested in developing deep learning models that are capable of employing knowledge in the form of natural language. Such knowledge is easy to interpret and to define (compared to structured representations like syntactic trees, knowledge graphs and symbolic rules). Unstructured knowledge increases the interpretability of models and goes in the direction of defining a realistic type of artificial intelligence. However, properly integrating this type of information is particularly challenging due to its inherent ambiguity and variability.
Recent work has demonstrated how data-driven AI methods can leverage consumer protection by supporting the automated analysis of legal documents. However, a shortcoming of data-driven approaches is poor explainability. We posit that in this domain useful explanations of classifier outcomes can be provided by resorting to legal rationales. We thus consider several configurations of memory-augmented neural networks where rationales are given a special role in the modeling of context knowledge. Our results show that rationales not only contribute to improve the classification accuracy, but are also able to offer meaningful, natural language explanations of otherwise opaque classifier outcomes.
Consumer contracts often contain unfair clauses, in apparent violation of the relevant legislation. In this paper we present a new methodology for evaluating such clauses in online Terms of Services. We expand a set of tagged documents (terms of service), with a structured corpus where unfair clauses are liked to a knowledge base of rationales for unfairness, and experiment with machine learning methods on this expanded training set. Our experimental study is based on deep neural networks that aim to combine learning and reasoning tasks, one major example being Memory Networks. Preliminary results show that this approach may not only provide reasons and explanations to the user, but also enhance the automated detection of unfair clauses.
Many NLP applications require models to be interpretable. However, many successful neural architectures, including transformers, still lack effective interpretation methods. A possible solution could rely on building explanations from domain knowledge, which is often available as plain, natural language text. We thus propose an extension to transformer models that makes use of external memories to store natural language explanations and use them to explain classification outputs. We conduct an experimental evaluation on two domains, legal text analysis and argument mining, to show that our approach can produce relevant explanations while retaining or even improving classification performance.
One of our main research interests is Argument/Argumentation Mining (AM). It can be informally described as the problem of automatically detecting and extracting arguments from the text. Arguments are usually represented as a combination of a premise (a fact) that supports a subjective conclusion (opinion, claim).
Argumentation Mining touches a wide variety of well-known NLP tasks, spanning from sentiment analysis, stance detection to summarization and dialogue systems.
We propose a novel architecture for Graph Neural Networks that is inspired by the idea behind Tree Kernels of measuring similarity between trees by taking into account their common substructures, named fragments. By imposing a series of regularization constraints to the learning problem, we exploit a pooling mechanism that incorporates such notion of fragments within the node soft assignment function that produces the embeddings. We present an extensive experimental evaluation on a collection of sentence classification tasks conducted on several argument mining corpora, showing that the proposed approach performs well with respect to state-of-the-art techniques.
Feature-agnostic method to jointly perform different argument mining tasks, with an emphasis on link prediction.
Online tool for argument mining.
The automatic extraction of arguments from text, also known as argument mining, has recently become a hot topic in artificial intelligence. Current research has only focused on linguistic analysis. However, in many domains where communication may be also vocal or visual, paralinguistic features too may contribute to the transmission of the message that arguments intend to convey. For example, in political debates a crucial role is played by speech. The research question we address in this work is whether in such domains one can improve claim detection for argument mining, by employing features from text and speech in combination. To explore this hypothesis, we develop a machine learning classifier and train it on an original dataset based on the 2015 UK political elections debate.
Website of the 2021 workshop on argument mining
©Copyright 2024 - ALMA MATER STUDIORUM - Università di Bologna - Via Zamboni, 33 - 40126 Bologna - Partita IVA: 01131710376 Privacy - Legal notes - Cookie settings