Now showing items 11-20 of 25
MERLIN: An Online Trilingual Learner Corpus Empirically Grounding the European Reference Levels in Authentic Learner Data
Since its publication in 2001, the Common European Framework of Reference for Languages (CEFR) has gained a leading role as an instrument of reference for language teaching and certification. Nonetheless, there is a growing ...
An extended version of the KoKo German L1 Learner corpus
This paper describes an ex- tended version of the KoKo corpus (ver- sion KoKo4, Dec 2015), a corpus of written German L1 learner texts from three different German-speaking regions in three different countries. The KoKo ...
Towards high-accuracy bilingual phrase acquisition from parallel corpora
We report on on-going work to derive translations of phrases from parallel corpora. We describe an unsupervised and knowledge-free greedy-style process relying on innovative strategies for choosing and discarding candidate ...
StirWaC: compiling a diverse corpus based on texts from the web for South Tyrolean German
In this paper, we report on the creation of a web corpus for the variety of German spoken in South Tyrol. We hence provide an example for the compilation of a corpus for a language variety that has neighboring varieties ...
A Trilingual Learner Corpus illustrating European Reference Levels
Since its publication in 2001, the Common European Framework of Reference for Languages (CEFR) has gained a leading role as an instrument of reference for language teaching and certification and for the development of ...
Correcting OCR errors for German in Fraktur font
In this paper, we present ongoing experiments for correcting OCR errors on German newspapers in Fraktur font. Our approach borrows from techniques for spelling correction in context using a probabilistic edit-operation ...
DI-ÖSS: Building a digital infrastructure in South Tyrol
This paper presents the DI-ÖSS project, a local digital infrastructure initiative for South Tyrol, which aims at connecting institutions and organizations that are working with language data. It shall serve to facilitate ...
EnetCollect in Italy