Now showing items 1-6 of 6
Open Corpus Interface for Italian Language Learning
In this article, we present the multi-faceted interface to the open PAISà corpus of Italian. Created within the project PAISà (Piattaforma per l’Apprendimento dell’Italiano Su corpora Annotati) , the corpus is designed ...
A Generic Data Workflow for Building Annotated Text Corpora
(Peter Lang, 2015)
We present an abstract and generic workflow, and detail how it has been implemented to build and annotate learner corpora. This workflow has been developed through an interdisciplinary collaboration between linguists, who ...
High-Accuracy Phrase Translation Acquisition Through Battle-Royale Selection
(RANLP 2011 Organising Committee / ACL, 2013)
In this paper, we report on an unsupervised greedy-style process for acquiring phrase translations from sentence-aligned parallel corpora. Thanks to innovative selection strategies, this process can acquire multiple ...
Towards high-accuracy bilingual phrase acquisition from parallel corpora
We report on on-going work to derive translations of phrases from parallel corpora. We describe an unsupervised and knowledge-free greedy-style process relying on innovative strategies for choosing and discarding candidate ...
Correcting OCR errors for German in Fraktur font
In this paper, we present ongoing experiments for correcting OCR errors on German newspapers in Fraktur font. Our approach borrows from techniques for spelling correction in context using a probabilistic edit-operation ...