The InTune (EU) Project. Media Linguists Working Group included several members of the SiBol Group.

 IntUne: “Integrated and United? A Quest for Citizenship in an Ever Closer Europe”

The IntUne project was an Integrated Project on the theme of Citizenship financed by the European Union within the scope of the 6th Framework Programme. It was a four year project which officially started on the 1st September 2005 and is coordinated by the University of Siena. It involved 29 European Institutions and over 100 scholars across Eastern and Western Europe.
Its geographical and disciplinary integrating capacity as well as the joint effort of many scholars and practitioners specialised in different fields (political science, sociology, public policy, media, linguistics and socio-psychology) represented a step forward in the strengthening of the European Research Area in the social sciences and humanities in general.

The following is an edited version of the Final Report of the IntUne Media corpus linguists Working Group (coordinator: John Morley).

Media Working Group

Final Report


1. Introduction

The Media Working Group (MWG) consisted of corpus linguists and researchers expert in discourse analysis, working with approaches which are both hermeneutic and empirical. The function of the MWG within IntUne as a whole was to monitor the attention the print and television media paid to problems of citizenship in the four countries with which the group was concerned – France, Italy, Poland, U.K. – and to analyse the ways in which evidence of this attention was presented to the media-consuming public.

2. Specific Objectives

The specific objectives of the MWG corresponded to the deliverables of (1) data collection and coding of printed and electronic media, (2) discourse analysis of media texts, and (3) training for the discourse analysis of media texts, (4) dissemination of results through a stakeholders’ meeting.


Preparation for the training meetings began at the Kick Off Meeting (KOM) inSiena. During the KOM, the dates for the collection of the Pilot Corpora were agreed (21 to 26 November 2005) as well as the media titles to be collected: for the printed press, two national newspapers, one left-wing and one right-wing, two local newspapers and a financial paper; for the television, two evening news programmes each day from the main public and commercial stations and a number of political debate programmes. The method of collection was also decided, which was to be by whole titles rather than by keyword or article search, i.e., the data would include entire newspapers and television news programmes and not those items concerned specifically with European issues or news.

At the first Training Meeting in Bertinoro (Italy), the Pilot Corpora was presented and the problems encountered in their compilation were described. In addition, two technical pre-Training Meetings for smaller numbers of researchers and collaborators took place inLorientandCardiff, feeding into the second Training Meeting inLorientin July. The pre-Training meetings enabled sub-groups to work through technical problems and produce solutions which were communicated to the rest of the group both by email and at the Training Meeting inLorient. These meetings involved the technicians dealing with the mark-up of newspaper data (inLorient) and the mark-up of television data (inSienaandCardiff). During these meetings hands-on workshops with Xaira (see below) and another software for transcribing spoken text (Transcriber) were conducted.

Between ten and twenty young researchers, including Post Doctorals, PhD students, technicians and MA students in all four countries played an important role in the research. Many of the solutions to our technical problems come from them.

Data collecting

The MWG collected the two waves of corpus data in three months of 2007 and in three months of 2009.

Two national newspapers of different political persuasions and two local newspapers from different geographical locations or cultural positions were collected.

Italy: Corriere della Sera, Repubblica, Il Giornale di Brescia, La Gazzetta del Sud.

France: Le Figaro, Liberation, Telegramme, Sud-Ouest

Poland: Gazeta Wyborcza, Super Express, Dziennik Łodzki, Glos Wielkopolski.

United Kingdom: Guardian, Telegraph, Scotsman, Western Mail.

The TV news programmes were collected over only eight weeks of the same periods in 2007 and 2009. The main evening national state and commercial TV news programmes were to be recorded.

Italy: RAI Uno; TG5;

France: TF1, Fr3;

Poland: TV1; Polsat;

United Kingdom:BBC1;ITN.

Data analysis

Even before the final mark-up of the two waves of the corpus around 30 presentations using IntUne data were made ranging from in-house documents to formal papers at international conferences. The main effort of the MWG, however, has been the production of volume six of the Oxford University Press series dedicated to IntUne (Bayley and Williams eds [Forthcoming]).

3. Methodological considerations

 Corpus building

In order to integrate their work into the activities of the other three IntUne Working Groups, the MWG decided to compile the corpora at the same time as the other groups conducted their Europe-wide surveys concerning attitudes to citizenship among elite groups, experts and ordinary citizens. While it is hoped that there will be connections between the concerns of these groups as revealed by the surveys and the key arguments found in the media, it remains the conviction of the MWG that their research can, in the last analysis, be treated as independently valuable and will cast light upon the problems of European citizenship which could come from no other source.

A set of conventions for mark-up was established, i.e., the assigning of textual, semantic or grammatical indicators to sections of the corpus. It was decided to mark up the corpus using Text Encoding Initiative (TEI) conformant Extensible Mark-up Language (XML), an international standard being developed mainly at the University of Oxford (UK). Although labour-intensive, the advantages in terms of delicacy of analysis justify its use. Since there has been very little work done on TEI for newspapers and none at all for television news, it was necessary to develop an innovative system designed to deal with this type of data. The protocols for marking-up newspapers and TV news have been informed by previous pioneering work in this area by researchers now involved in the IntUne project, mainly when working on the CorDis Project (Cirillo et al).

The production of TEI conformant marked-up corpora makes it possible to use the new software, Xaira, being produced in Oxfordfor the latest version of the British National Corpus (BNC) which, since its appearance in the nineties, has become the same sort of resource for corpus linguists that the Oxford English Dictionary is for lexicographers. The IntUne Media Working Group was one of the groups of researchers worldwide involved in perfecting the Beta version of Xaira.


Bayley, Paul and Geoffrey Williams (eds.). Forthcoming. European Identity: What the Media Say. Oxford University Press.


Cirillo, Letizia, Anna Marchi and Marco Venuti. 2009. The making of the CorDis corpus: compilation and markup. In J. Morley and P. Bayley (eds) Corpus Assisted Discourse Studies on the Iraq Conflict: Wording the War,London: Routledge.