Abstract
This poster presentation introduces a recently created corpus of longitudinal learner data, called LEONIDE. The corpus contains 2.543 texts from 163 pupils, who participated in the project “One school, many languages” conducted in eight schools in the officially multilingual Italian province of South Tyrol / Alto Adige (Zanasi & Stopfner, 2018). The aim of the project was to document the development of the pupils' plurilingual linguistic and communicative skills by collecting oral and written language samples in Italian, German and English, in order to obtain a global view of their individual linguistic repertoire.
LEONIDE is a collection of all the written texts of the pupils during the course of the project, the overall size of the corpus amounts to ca. 240.000 tokens. The texts were collected over the span of 3 consecutive years (2015-2018) in public middle schools (i.e. lower secondary school, grade 6 to grade 8). The pupils were 11 years old at the beginning of the data collection and 13 years old at the end. In each grade, two written texts were collected that differ with respect to genre: the first text was elicited using a picture story re-telling task; the second text is an opinion text on different aspects related to the pupils’ life and public discourse. For each genre and each grade, the corpus provides texts in the three languages German, Italian and English. In order to reflect the school system of the Province of South Tyrol / Alto Adige, about half of the texts was collected in four schools in which German is the main language of teaching and Italian is taught as L2. The other half of the texts was collected in four schools in which Italian is the main language of teaching and German is taught as L2. In all schools, English is taught as L3 (i.e. as a foreign language at school). Subdivided by language, the corpus contains 850 Italian, 849 German and 844 English texts. Furthermore, a series of relevant person-related data was collected for each learner, providing information about e.g. age, gender, first language(s), language assessment scores for each of the three languages.
As the corpus documents the development of plurilingual competences of individual learners, it allows for contrastive longitudinal research on the development of young learners’ writing skills in different languages, considering also person-related metadata. Moreover, the corpus is a valuable resource for language teachers in order to create and improve their teaching material and language courses as the large amount of authentic and longitudinal data reflects the sequencing of language skills over three consecutive years in three languages. The corpus will be available for corpus queries via an ANNIS search interface and as download for academic purposes (ACA-BY-NC-NORED 1.0) on the Eurac Research Clarin Centre by the end of 2020.