Lost in Translation? Exploring the Potentials and Pitfalls of Text-as-Data Methods for the Cross-national Comparison of Local Spatial Development Policies

Theresia Morandell

Text-as-data methods enjoy increasing popularity in comparative politics research not only due to their convenience in analyzing large quantities of textual data but also due to their potential in facilitating cross-national comparative research in multi-lingual contexts. The paper leverages in particular on this latter aspect of text-as-data methods by presenting a comparative analysis of local spatial development policies across 14 European-OECD countries spanning various language contexts. The research question guiding the analysis is the following: To what extent do medium-sized European cities address the broad set of physical and functional linkages (urban-rural relations) which typically exist between urban centers and their suburban and rural neighboring municipalities? As cities continue to grow beyond their administrative boundaries to form highly connected city-regions with their surrounding municipalities, the management of urban-rural relations has emerged as a strategic policy objective in the policy sector of spatial planning. The paper relies on text-as-data methods, structural topic models in particular, to conduct a large-n comparative analysis exploring if the European academic and policy discourse on urban-rural relations has translated into the contents of planning policy at the local (city) level. The aim of the analysis is to explore i) to what extent and how urban-rural relations occur as a topic in local spatial development policies adopted by medium-sized European cities, and ii) how the outcomes under point i) vary according to the characteristics of the territorial and policymaking context in which the analyzed policies were adopted. The paper thereby relies on an original corpus of 241 policy documents adopted by a sample of 115 medium-sized cities across 14 European-OECD countries, spanning nine languages. Medium-sized cities (also referred to as intermediate cities, or i-cities) are a theoretically interesting type of settlement to study in this regard. In the relevant literature, they are discussed to be regional centers for the provision of administrative, economic, cultural, infrastructural and planning functions, servicing both urban and rural populations within their broader region. It is this assumption of strong linkages of the intermediate city to its surrounding region which renders it a particularly relevant category of settlements to focus on in this analysis. Intermediate cities have only recently begun to enter the academic and policy agenda and are still relatively under-researched as compared to their bigger metropolitan counterparts. The points of innovation of this research are two-fold. First, it breaks with a tradition of adopting predominantly small-n case study approaches to analyzing instances of urban-rural relations and city-regional coordination in spatial policymaking. Secondly, the paper contributes to exploring the promises and pitfalls of text-as-data methods for social science research in the European multi-lingual context where language barriers pose a marked obstacle to cross-national comparative research of textual data. This paper taps into a line of social science research which leverages on the rapid advances in machine-translation software to overcome language barriers, by translating textual data written in a variety of source languages into English as a common reference language on which the actual analysis is performed. However, machine-translation is not error free. There is the potential that systematic differences in vocabulary use between the source languages continue to persist in the machine-translated text corpus and bias the model estimation. We may end up modeling variations in word use to specific language contexts rather than detecting topics as semantically coherent concepts across linguistic borders. Researchers may counter this problem by relying on the statistical properties of structural topic models which allow to include information on a text document’s source language directly into the model estimation, thereby controlling for systematic differences in vocabulary use across different languages. The paper test the limits of such an approach by extending the number of source languages to be included for analysis, as well as by applying it to an analysis of policy documents in the highly technical policy sector of spatial planning. Spatial planning is characterized by a distinct and specialized vocabulary which may differ considerably across various national planning contexts. Is there an upper limit to linguistic variation which can be accounted for in topic modeling?

Lost in Translation? Exploring the Potentials and Pitfalls of Text-as-Data Methods for the Cross-national Comparison of Local Spatial Development Policies

Abstract

Files and links (1)

Details

Metrics