Now showing items 1-4 of 4
(RANLP 2011 Organising Committee / ACL, 2013)In this paper, we report on an unsupervised greedy-style process for acquiring phrase translations from sentence-aligned parallel corpora. Thanks to innovative selection strategies, this process can acquire multiple ...
(2014)In this paper, we present ongoing experiments for correcting OCR errors on German newspapers in Fraktur font. Our approach borrows from techniques for spelling correction in context using a probabilistic edit-operation ...
(libreriauniversitaria.it, 2013)In this article, we present the multi-faceted interface to the open PAISà corpus of Italian. Created within the project PAISà (Piattaforma per l’Apprendimento dell’Italiano Su corpora Annotati) , the corpus is designed ...
(Association for Computational Linguistics, 2014)PAISÀ is a Creative Commons licensed, large web corpus of contemporary Italian. We describe the design, harvesting, and processing steps involved in its creation.