Abstract
The rapid advancements in large language models (LLMs) are transforming specialized translation and terminology work. These tools promise significant benefits, such as increased efficiency and accessibility, but they also pose challenges, particularly in addressing the complexity of language-variety-specific terminology. This issue is especially pronounced in fields such as higher education, where terminology reflects institutional, regional and cultural differences. In German-speaking countries, the terminology of higher education varies considerably across Austrian, German and South Tyrolean varieties of the German language, among others. Such variation complicates the work of translators and terminologists and presents unique challenges for LLMs, such as ChatGPT.
Currently, ChatGPT and similar LLMs often fail to adequately account for terminological nuances across language varieties. This is particularly evident in the field of specialized translation, where precision and context are paramount. To address these limitations, this research investigates how ChatGPT handles German-language higher education terminology from Austrian, German and South Tyrolean contexts. It explores whether prompt engineering techniques can improve ChatGPT’s ability to accommodate terminological variation and deliver high-quality translations from German into English and vice versa. The ultimate goal is to contribute to the broader understanding of language-variety-specific terminology in LLMs.
The UniTermGPT project involves the compilation of a comprehensive corpus of higher education texts from Austria, Germany and South Tyrol. Pre-processing includes language (varieties) identification and tagging, text segmentation, the removal of duplicate documents and the alignment of bilingual ones. The tool and method for annotating the corpus will be selected together with the (future) annotators. These texts from different university domains serve as the foundation for extracting terminology specific to each language variety. The extracted terms are then compared with existing terminological resources to establish a benchmark for evaluating ChatGPT’s performance. Using this corpus, selected texts are translated by ChatGPT, following prompt engineering to optimize the LLM output. Generally, the translation quality of LLMs is contingent upon several interrelated factors, including the specific model, the languages and varieties involved (particularly in the case of low-resource languages), the prompt and the potential integration of retrieval-augmented generation to enhance output quality.
The UniTermGPT study is limited in scope by focusing primarily on German-English translations within specific varieties and within the university domain. Biases in corpus compilation and annotation might stem from reliance on easily accessible data and self-selected participants, leading to text types that reflect institutional or corporate language rather than accurately representing the broader higher education system or language variety.
The quality of ChatGPT’s translations is assessed through expert annotations, focusing on how well the LLM captures language-variety-specific terminology. These evaluations include translations from German into English and vice versa, reflecting the real-world demands of academic and institutional communication.
A key innovation of UniTermGPT is its emphasis on open research practices. The project integrates the needs of translators and terminologists by making its datasets, methodologies and findings openly accessible. This transparency not only fosters collaboration but also ensures that the project’s outcomes can inform broader applications in specialized translation. While UniTermGPT focuses on higher education terminology, its findings have broader implications for other fields characterized by language-variety-specific terminology, such as law or healthcare.
The project also addresses critical gaps in the existing literature on LLMs in translation and terminology. While previous studies have explored the general capabilities of LLMs for translation and terminology tasks, few have examined their ability to handle language-variety-specific terminology in specialized fields. UniTermGPT contributes to this emerging area of research, emphasizing the societal relevance of terminology management in an AI-driven age. By highlighting the importance of precision and inclusivity in translation, the project underscores the broader implications of linguistic diversity in LLM development.
In addition to its research contributions, UniTermGPT offers practical recommendations for translators, terminologists and other stakeholders. These recommendations include strategies for effective prompt engineering, guidelines for integrating LLM tools into professional workflows, and best practices for addressing language- variety-specific terminology in translation. A policy brief further underscores the societal relevance of the project, advocating for the ethical and inclusive use of LLMs in translation and terminology work. The findings also benefit underrepresented communities because UniTermGPT promotes linguistic inclusion in LLM development as well as consideration in translation practices. It also informs more inclusive language and translation policies.
In conclusion, UniTermGPT represents a significant step forward in the application of LLMs to specialized translation and terminology work. By addressing the overlooked topic of language- variety-specific terminology, the project contributes to the broader goals of inclusivity, precision and societal relevance in LLM-supported translation. The findings and recommendations of UniTermGPT promise to inform both academic research and professional practice, ensuring that the transformative potential of LLMs is harnessed responsibly and effectively in the (digital) humanities and beyond.