Now showing items 1-2 of 2
The MERLIN corpus: Learner language and the CEFR
(European Language Resources Association (ELRA), 2014)
The MERLIN corpus is a written learner corpus for Czech, German, and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR) with authentic learner data. The corpus ...
Correcting OCR errors for German in Fraktur font
In this paper, we present ongoing experiments for correcting OCR errors on German newspapers in Fraktur font. Our approach borrows from techniques for spelling correction in context using a probabilistic edit-operation ...