Abstract
Varieties of plurilinguistic languages are each investigated and described to different extents, where the ‘main’ varieties usually have a much better NLP coverage than their lesser-used counterparts. The toolkit described in this contribution aims at supporting linguists in their comparison of language varieties, which in most cases have many similar linguistic characteristics. For their differences, even subtle ones, text corpora are to be used as a basis to semi-automatically extract particularities on different levels of linguistic description. Existing and adapted as well as new tools will be used for the extraction, and the toolkit will provide ‘candidate lists’ to reduce the efforts of experts. The system will be first evaluated with German varieties, but it is intended to be transferable to other languages as well.