Identifying dialect writings in written online communication: New data-driven approaches

Jennifer-Carmen Frey; Aivars  Glaznieks; A Glück

Back

Identifying dialect writings in written online communication: New data-driven approaches

Conference presentation

Open access

Identifying dialect writings in written online communication: New data-driven approaches

Jennifer-Carmen Frey, Aivars Glaznieks and A Glück

Corpus Linguistics Conference CL2021 (Limerick, 13/07/2021 - 16/07/2021)

2021

Handle:

https://hdl.handle.net/10863/18368

Abstract

The use of vernacular language depicting local dialects is a commonly observed characteristic of written technology-mediated communication of some communities (e.g. Alshutayri & Atwell, 2019, Ueberwasser & Stark 2017, Frey et al. 2015). While it is easy for members (or observers) of the community to recognize and understand local dialect spellings, it is a methodological challenge to empirically assign non-standard spelling variants to a specific local variety, thus separating them from misspellings or supra-regional vernacular. We present a study that addresses this problem using data-driven methods for the analysis of German Facebook texts from the DiDi corpus of South Tyrolean CMC Data (Frey et al.2016). The DiDi corpus provides access to more than 23.000 mainly German status updates, comments and chat messages of around 120 writers written in the year 2013 (corpus size in tokens: ca 374.000). The corpus provides person-related metadata, such as gender, age and geographic origin, which are relevant variables for language variation (Löffler 2003). By correlating frequently occurring spelling variants of the Standard German -er suffix in the DiDi corpus to geographic, social and situational variables, Glück and Glaznieks (2019) were able to relate one variant (-o) to a specific geographic area (Val Pusteria) with a typical distribution for dialect use confirmed by the variables gender (cf. also Sieburg 1992), age (cf. Vergeiner et al. 2020) and communication type. In this presentation we extend the approach using methods from natural language processing and social network analysis to look at cooccurring features on grapheme, word and text level. By quantitative and subsequent qualitative analyses of the data we could not only identify more dialect features (e.g. the substitution of by ) but also determine the consistency of writers and whether they clearly distinguish between standard and regional varieties.

Files and links (1)

pptx

Paper328Jennifer-CarmenFrey_slides331.85 kBDownload View

Open Access

Details

Title: Identifying dialect writings in written online communication: New data-driven approaches
Creators: Jennifer-Carmen Frey
Aivars Glaznieks
A Glück
Conference: Corpus Linguistics Conference CL2021 (Limerick, 13/07/2021 - 16/07/2021)
Identifiers: (EURAC)22890907
991006070442701241
Academic Unit: Institute for Applied Linguistics
Language: English
Resource Type: Conference presentation
Local Fields: Scientific
Author Names String: Frey JC, Glaznieks A, Glück A

Metrics

5 File views/ downloads

27 Record Views