The DiDi Project: Collecting, Annotating, and Analysing South Tyrolean Data of Computer-mediated Communication
At: ird-cmc-rennes: International Research Days: Social Media and CMC Corpora for the eHumanities ; Rennes ; 23.10.2015 - 24.10.2015 ; Following a sociolinguistic user-based perspective on language data, the project DiDi investigated the linguistic strategies employed by South Tyrolean users on Facebook. South Tyrol is a multilingual region (Italian, German, and Ladin are official languages) where the South Tyrolean dialect of German is frequently used in different communicative contexts. Thus, regional and social codes are often also used in written communication and in computer mediated communication. With a research focus on users with L1 German living in South Tyrol, the main research question was whether people of different age use language in a similar way or in an age-specific manner. The project lasted 2 years (June 2013 - May 2015). We created a corpus of Facebook communication that can be linked to other user-based data such as age, web experience and communication habits. We gathered socio-demographic information through an online questionnaire and collected the language data of the entire range of social interactions, i.e. publicly accessible data as well as non-public conversations (status updates and comments, private messages, and chat conversations) written and published just for friends or a limited audience. The data acquisition comprised about 150 users interacting with the app, offering access to their language data and answering the questionnaire.In this talk, I will present the project, its data acquisition app and text annotation processes (automatic, semi-automatic, and manual), discuss their strengths and limitations, and present results from our data analyses.