Abstract
It was not before the advent of powerful computers that corpus linguistics has developed into a widely applied research methodology. Indeed, corpus linguistics heavily relies on computer-powered analysis tools. They get used on a daily basis by corpus linguists to retrieve examples and analyze authentic data from corpora of extensive sizes. Despite their indisputable importance, repetitive remarks highlight the fact that corpus analysis tools have evolved little since their early days. Concordances, frequency lists and collocation extraction still constitute the core functionalities of most corpus tools.
With the aim to incentivize new functional developments, this thesis presents research on open demands in current corpus research practice and related requirements for tools support. It builds on the assumption that more user-centered research is needed to bridge the gap between mainly computationally trained tool developers and their linguistic expert users, who come with specialized domain knowledge and often sophisticated analytical needs. The research is approached by means of three user investigations that enquire about corpus research workflows and analysis activities as well as theoretical principles and methodological considerations in corpus linguistics research practice. This way a comprehensive picture of the corpus usage situation is assembled by combining insights from open ended enquiries (interviews) with quantitative results on selected aspects of the corpus analysis scenario (questionnaire) derived from enquiries with overall more than 100 corpus users. Based on the results, a range of open demands for corpus research and tools are identified and discussed. They relate to (1) corpus resources, (2) general aspects of tools, (3) corpus analysis procedures, and (4) best practices. The results show that open demands address challenges on very different operational levels, ranging from the availability of corpus resources and reliable annotations, technical requirements related to scalability and interoperability issues, usability and technical and methodological skills up to proper functional demands. The thesis discusses potential paths to address the open demands, and provides pointers to recent developments in corpus linguistics and related fields, in particular computational linguistics and natural language processing as well as linguistic information visualization.
The research contribution of this thesis is twofold. On the methodological level, it elaborates on methods and challenges for user-centered research on tools for open-ended tasks and provides entrance points for further user-centered research by identifying and organizing, as reference, the basic building blocks of corpus linguistics research. On the content level, it provides first insights on user perspectives and needs related to the corpus research practice. It describes concrete demands and discusses paths to their solution. This way, it prepares the ground for further in-depths studies and user-centered developments of new corpus functionalities for specific demands.