Abstract
In spite of the increased application of corpus-based methods in phraseological research in the past years (cf. Heid 2005; Heid/Weller 2010; Steyer 2013), the initiating interest for phraseological aspects in learner corpora research (cf. Paquot/Granger 2012) and the constantly growing number of phraseodidactic studies (cf. Kühn 1987; 1992; Lorenz-Bourjot/Lüger 2001; Gonzáles Rey 2013; 2014; Konecny et al. 2013; Sułkowska 2013), suggestions of appropriate criteria for identifying, classifying and analyzing phrasemes in learner corpora seem to be still underrepresented. Studies of such kind could be useful not only for revealing the actual use of phrasemes at various CEFR levels (2001) and for detecting recurrent mistakes and error causes, but also for developing suitable didactic material in order to achieve a certain target level in the phraseological use at different CEFR levels. Within the LeKo project (www.leko-project.org), carried out in cooperation between the University of Innsbruck and the European Academy of Bolzano/Bozen, we aim at describing the use of phrasemes by L2 learners of Italian for didactic purposes, by combining both quantitative and qualitative methodological approaches. To this end, we analyze a subset of the KOLIPSI corpus, which consists of German and Italian L2 productions (written by South Tyrolean secondary school pupils) that have already been assigned to CEFR levels in reliable way (cf. Abel et al. 2012); the LeKo subcorpus covers the levels A2-C1 and contains 288 Italian texts written by German L1 pupils.
Before analyzing the phrasemes present in our corpus, first it was necessary to establish our conception of phrasemes and the criteria to be applied for identifying and classifying them. For this purpose, we adopted a combination of deductive and inductive methods, i.e. following current concepts present in pertinent studies as well as pre-analyzing selected KOLIPSI texts in order to get an idea of which phraseme types are actually used by the learners. As far as collocations are concerned, we found out that lexical collocations strictu sensu occur only rarely (especially on lower CEFR levels), for which it seemed useful to adopt a broader concept of collocation, including also sequences such as andare a casa (‘to go home’) and guardare su/in internet (‘to look/search on the internet’), which seem freely combined but in which at least the prepositions are idiosyncratically bound to the actual verb. As to other phraseme (sub)categories, we decided to adopt a “mixed classification” in terms of Burger (2007: 53). Besides various types of referential (both non-idiomatic and idiomatic) phrasemes, we took into consideration also communicative and structural phrasemes.
In our paper, we will illustrate which phrasemes we detected at various levels and how we proceeded in assigning them to different phraseological (sub)types as well as in the identification of possible error causes. In order to record several possible kinds of the latter, we provided for a separate category named “not exist” for those cases in which phrasemes existing in German were translated literally into not-existing Italian expressions (e.g. *queste vacanze diventano il martello = interference from Germ. dieser Urlaub wird der Hammer).