Training an NMT system for legal texts of a low-resource language variety (South Tyrolean German – Italian)

A Oliver; S Álvarez; Egon Waldemar Stemle; Elena Chiocchetti

Back

Training an NMT system for legal texts of a low-resource language variety (South Tyrolean German – Italian)

Conference proceeding

Open access

Peer reviewed

Training an NMT system for legal texts of a low-resource language variety (South Tyrolean German – Italian)

A Oliver, S Álvarez, Egon Waldemar Stemle and Elena Chiocchetti

Proceedings of the 25th Annual Conference of the European Association for Machine Translation. Volume 1: Research And Implementations & Case Studies, pp.573-579

EAMT2024 (The 25th Annual Conference of The European Association for Machine Translation) (Sheffield, 24/06/2024 - 27/06/2024)

2024

Handle:

https://hdl.handle.net/10863/43174

Abstract

This paper illustrates the process of training and evaluating NMT systems for a language pair that includes a low-resource language variety. A parallel corpus of legal texts for Italian and South Tyrolean German has been compiled, with South Tyrolean German being the low-resourced language variety. As the size of the compiled corpus is insufficient for the training, we have combined the corpus with several parallel corpora using data weighting at sentence level. We then performed an evaluation of each combination and of two popular commercial systems

Files and links (3)

pdf

Oliveretal.-2024-TraininganNMTsystemforlegaltextsofalow-resourcelanguagevariety(SouthTyroleanGerman–I242.87 kBDownload View

CC BY-NC-ND V4.0, Open Access

url

https://eamt2024.sheffield.ac.uk/View

url

https://eamt2024.github.io/proceedings/vol1.pdfView

Details

Title: Training an NMT system for legal texts of a low-resource language variety (South Tyrolean German – Italian)
Creators: A Oliver
S Álvarez
Egon Waldemar Stemle
Elena Chiocchetti
Publication Details: Proceedings of the 25th Annual Conference of the European Association for Machine Translation. Volume 1: Research And Implementations & Case Studies, pp.573-579
Conference: EAMT2024 (The 25th Annual Conference of The European Association for Machine Translation) (Sheffield, 24/06/2024 - 27/06/2024)
Number of pages: 8
Identifiers: (EURAC)28527304
991006856597901241
Copyright: The papers published in this proceedings are —unless indicated otherwise— covered by the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC-BY-NC-ND 4.0). You may copy, distribute, and transmit the work, provided that you attribute it (authorship, proceedings, publisher) in the manner specified by the author(s) or licensor(s), and that you do not use it for commercial purposes. The full text of the licence may be found at https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en ©2024 The authors
Academic Unit: Institute for Applied Linguistics
Language: English
Resource Type: Conference proceeding
Description coverage: international
Description audience: Scientific
Local Fields: Scientific
Author Names String: Oliver A, Álvarez S, Stemle EW, Chiocchetti E

Metrics

1 Record Views