Abstract
The paper reports on the creation, annotation and curation of the MT@BZ corpus, a bilingual (Italian–South Tyrolean German) corpus of machine-translated legal texts from the officially multilingual Province of Bolzano, Italy. It is the first human error-annotated corpus (with an adapted SCATE taxonomy) of machine-translated legal texts in this language combination that includes a lesser-used standard variety. Project data are available on GitHub and CLARIN. The output of the customized engine achieved notably better BLEU, TER and chrF2 scores than the baseline. Over 50% of the segments needed no human revision. The most frequent error categories were mistranslations and bilingual (legal) terminology errors. Our contribution brings fine-grained insights to Machine Translation Evaluation research, as it concerns a less common language combination, a lesser-used language variety and a societally relevant specialized domain. Such
results are necessary to implement and inform the use of MT in institutional contexts of smaller language communities.