Reference:
GRANDI, Nicola, BALLARÈ, Silvia, CHIUSAROLI, Francesca, GALLINA, Francesca, PASCOLI, Matteo, PISTOLESI, Elena; Corpus Univers-ITA. 2023, DOI: https://doi.org/10.60760/unibo/univers-ita
During the 2020/2021 academic year, data collection was carried out for the construction of the corpus.
Initially, a sample was created to be representative of the Italian university population, using the geographical location of the university and the academic discipline of the degree program as parameters.
A total of 2.160 second-year students from the following academic areas and universities were involved in the data collection:
During the data collection phase, in order to maintain control over the sample, a username and password were provided to all students to access the website created specifically for data collection.
The data collection was structured in two parts:
1. Text Writing
In the first phase of the collection, students were asked to write a short text. Below are the instructions provided to the respondents.
On the first page, general instructions for writing the text were provided, namely: You must write a text of medium length: between 250 and 500 words. You should try to use a formal style: therefore, write as correctly as possible, as if you were writing to a professor. Since the survey is completely anonymous, it will be impossible to associate the text with your identity, and by participating in the survey, you renounce intellectual property over it. The text will never be published in full and will only be used for research purposes. To accept, click continue and discover the prompt (the topic on which you will write the text).
On the second page, the writing prompt was provided, along with a time counter (maximum time: 60 minutes) and word counter (minimum word count: 250; maximum word count: 500). Imagine that your degree program has opened a survey aimed at all students, with the goal of gathering opinions on the functioning of remote learning during the months of the health emergency. Write a text in which you present, in a non-schematic way, the advantages and disadvantages of remote learning, according to your point of view.
Remaining time: 60:00
Words written: 0
Words remaining: 500
2. Socio-biographical Questionnaire
The socio-biographical questionnaire consisted of over 50 questions, divided into 4 sections:
The responses collected through the questionnaire allowed for the creation of a very detailed socio-biographical profile of the various respondents. The respondents' metadata could then be used during the analysis phase to verify potential correlations between extra-linguistic characteristics and linguistic traits found in the texts.
The corpus currently consists of a total of 810,715 tokens. The texts in the corpus are accompanied by a wide range of metadata (obtained through the questionnaire). Additionally, it is possible to query the corpus using various search filters, as shown in the image and in the guide for consultation.
The corpus is accessible at this link.