Evaluation of statistical and machine learning based landslide susceptibility models for very large areas – coping with error prone input data
Previous research highlights that not only the input data quality, but also the applied classification technique affects substantially the results and the feasibility of subsequent landslide susceptibility analyses. The interplay between error-prone input data and the adapted classification algorithm (i.e. overfitting to errors inherent in the data) may be of particular relevance for large areas. The identification of the “best” of many produced landslide susceptibility models represents a challenging task, particularly since several quality-defining aspects are of qualitative nature and therefore challenging to quantify. Therefore, testing multiple classifiers became important in order to compare them and select the best performing model that fits the real status of the data. This research intends to evaluate several different landslide susceptibility maps generated for the Austrian territory (84,000 km2). For this purpose, five differently flexible algorithms were applied and tested quantitatively (i.e. predictive performance, degree of overfitting) and qualitatively (i.e. geomorphological plausibility). A particular focus was set on identifying situations in which the respective models reflected known input data errors (i.e. incomplete landslide information), rather than the expected landslide susceptibility situation. Three more commonly applied classifiers in the field of susceptibility analysis (i) “LR”: logistic regression model (i.e. low flexibility), (ii) “GAM”: generalized additive model (i.e. medium flexibility) and (iii) “SVM”: the support vector machine (i.e. high flexibility) were compared with novel approaches that allow to average out known biases inherent in the landslide inventory: (iv) “ME+LR”: mixed effects logistic regression and (v) “ME+GAM”: mixed effects generalized additive models. Based on the theoretical knowledge of the classifiers ability and the status of the input data, is envisaged on the results, indications about differentiated treatment and resulting predictions given by the classifiers. The replication of input data errors (bias) is expected to be higher on LR, GAM and SVM; while when the mixed effects models are applied (ME+LR and ME+GAM), such bias are counteracted, increasing the quality and reliability of the outcomes. The results of this study intend to contribute on the topic of landslide susceptibility predictions applied for large regions, counting with insufficient landslide observations using statistical and machine learning methods.