The MT@BZ corpus: machine translation & legal language

Flavia De Camillis; Egon Waldemar Stemle; Elena Chiocchetti; Francesco Fernicola

Back

The MT@BZ corpus: machine translation & legal language

Conference proceeding

Open access

Peer reviewed

The MT@BZ corpus: machine translation & legal language

Flavia De Camillis, Egon Waldemar Stemle, Elena Chiocchetti and Francesco Fernicola

Proceedings of the 24th Annual Conference of the European Association for Machine Translation 12 – 15 June 2023, Tampere, Finland, pp.171-180

EAMT 2023 (The 24th Annual Conference of the European Association for Machine Translation) (Tampere, 12/06/2023 - 15/06/2023)

2023

Handle:

https://hdl.handle.net/10863/36635

Abstract

minority languages

legal language

corpus

error annotation

Machine Translation

The paper reports on the creation, annotation and curation of the MT@BZ corpus, a bilingual (Italian–South Tyrolean German) corpus of machine-translated legal texts from the officially multilingual Province of Bolzano, Italy. It is the first human error-annotated corpus (with an adapted SCATE taxonomy) of machine-translated legal texts in this language combination that includes a lesser-used standard variety. Project data are available on GitHub and CLARIN. The output of the customized engine achieved notably better BLEU, TER and chrF2 scores than the baseline. Over 50% of the segments needed no human revision. The most frequent error categories were mistranslations and bilingual (legal) terminology errors. Our contribution brings fine-grained insights to Machine Translation Evaluation research, as it concerns a less common language combination, a lesser-used language variety and a societally relevant specialized domain. Such results are necessary to implement and inform the use of MT in institutional contexts of smaller language communities.

Files and links (3)

pdf

DeCamillis-et-al2023491.12 kBDownload View

CC BY-NC-ND V4.0, Open Access

url

https://events.tuni.fi/eamt23/View

url

https://events.tuni.fi/uploads/2023/06/11678752-proceedings-eamt2023.pdfView

Details

Title: The MT@BZ corpus: machine translation & legal language
Creators: Flavia De Camillis
Egon Waldemar Stemle
Elena Chiocchetti
Francesco Fernicola
Publication Details: Proceedings of the 24th Annual Conference of the European Association for Machine Translation 12 – 15 June 2023, Tampere, Finland, pp.171-180
Conference: EAMT 2023 (The 24th Annual Conference of the European Association for Machine Translation) (Tampere, 12/06/2023 - 15/06/2023)
Identifiers: (EURAC)27116397
991006613796901241
Copyright: CC-BY-NC-ND
Academic Unit: Institute for Applied Linguistics
Language: English
Resource Type: Conference proceeding
Description coverage: international
Description audience: Scientific
Local Fields: Scientific
Author Names String: De Camillis F, Stemle E, Chiocchetti E, Fernicola F

Metrics

20 File views/ downloads

26 Record Views