Now showing items 11-20 of 23
Towards high-accuracy bilingual phrase acquisition from parallel corpora
We report on on-going work to derive translations of phrases from parallel corpora. We describe an unsupervised and knowledge-free greedy-style process relying on innovative strategies for choosing and discarding candidate ...
StirWaC: compiling a diverse corpus based on texts from the web for South Tyrolean German
In this paper, we report on the creation of a web corpus for the variety of German spoken in South Tyrol. We hence provide an example for the compilation of a corpus for a language variety that has neighboring varieties ...
A Trilingual Learner Corpus illustrating European Reference Levels
Since its publication in 2001, the Common European Framework of Reference for Languages (CEFR) has gained a leading role as an instrument of reference for language teaching and certification and for the development of ...
Correcting OCR errors for German in Fraktur font
In this paper, we present ongoing experiments for correcting OCR errors on German newspapers in Fraktur font. Our approach borrows from techniques for spelling correction in context using a probabilistic edit-operation ...
Sprachenlernen und Crowdsourcing: ein innovatives Projekt
In dem Beitrag wird das Europäische Netzwerk zur Zusammenführung von Sprachenlernen und Crowdsourcing-Techniken (enetCollect) vorgestellt, welches beginnend mit 2017 für vier Jahre als COST Action gefördert wird. Zunächst ...
Transc&Anno: A Graphical Tool for the Transcription and On-the-Fly Annotation of Handwritten Documents
We present Transc&Anno, a web-based collaboration tool allowing the transcription of text images and their shallow on-the-fly annotation. Transc&Anno was originally developed in order to address the needs of learner ...