Abstract
The increasing use of large language models (LLMs) in translation and terminology work raises critical ethical and infrastructural questions, particularly regarding the commodification of open language resources. From a Digital Humanism perspective, this paper presents UniTermGPT, a project that examines how ChatGPT handles university-related terminology across German varieties, including Austrian, German and South Tyrolean and contributes FAIR-compliant, annotated corpora and resources to CLARIN.
UniTermGPT not only supports LLM benchmarking in specialized translation but also highlights the risks of open language resources being exploited by commercial LLM providers. By embedding CARE principles and Digital Humanism values such as transparency, inclusivity and epistemic justice into its methodology, this paper argues for stronger safeguards and ethical standards within public infrastructures. This includes mechanisms for provenance tracking, responsible licensing, transparent governance and the representation of minority language communities in decisions about their language resources.
Ultimately, UniTermGPT illustrates that openness in research data management can be balanced with responsibility, ethical reflection and attention to linguistic diversity. By demonstrating how open resources can support both technological development and broader societal benefits, the project provides a practical example of responsible openness and highlights ways in which infrastructures like CLARIN can facilitate the ethical sharing and use of language data.