Abstract
China is one of the countries where landslides caused the most fatalities in the last decades. The threat
that landslide disasters pose to people might even be greater in the future, due to climate change and the
increasing urbanization of mountainous areas. A reliable national-scale rainfall induced landslide susceptibility
model is therefore of great relevance in order to identify regions more and less prone to landsliding
as well as to develop suitable risk mitigating strategies. However, relying on imperfect landslide data
is inevitable when modelling landslide susceptibility for such a large research area. The purpose of this
study is to investigate the influence of incomplete landslide data on national scale statistical landslide
susceptibility modeling for China. In this context, it is aimed to explore the benefit of mixed effects modelling
to counterbalance associated bias propagations. Six influencing factors including lithology, slope,
soil moisture index, mean annual precipitation, land use and geological environment regions were
selected based on an initial exploratory data analysis. Three sets of influencing variables were designed
to represent different solutions to deal with spatially incomplete landslide information: Set 1 (disregards
the presence of incomplete landslide information), Set 2 (excludes factors related to the incompleteness
of landslide data), Set 3 (accounts for factors related to the incompleteness via random effects). The variable
sets were then introduced in a generalized additive model (GAM: Set 1 and Set 2) and a generalized
additive mixed effect model (GAMM: Set 3) to establish three national-scale statistical landslide susceptibility
models: models 1, 2 and 3. The models were evaluated using the area under the receiver operating
characteristics curve (AUROC) given by spatially explicit and non-spatial cross-validation. The spatial prediction
pattern produced by the models were also investigated. The results show that the landslide inventory
incompleteness had a substantial impact on the outcomes of the statistical landslide susceptibility
models. The cross-validation results provided evidence that the three established models performed well
to predict model-independent landslide information with median AUROCs ranging from 0.8 to 0.9.
However, although Model 1 reached the highest AUROCs within non-spatial cross-validation (median
of 0.9), it was not associated with the most plausible representation of landslide susceptibility. The
Model 1 modelling results were inconsistent with geomorphological process knowledge and reflected
a large extent the underlying data bias. The Model 2 susceptibility maps provided a less biased picture
of landslide susceptibility. However, a lower predicted likelihood of landslide occurrence still existed
in areas known to be underrepresented in terms of landslide data (e.g., the Kuenlun Mountains in the
northern Tibetan Plateau). The non-linear mixed-effects model (Model 3) reduced the impact of these
biases best by introducing bias-describing variables as random effects. Among the three models, Model
3 was selected as the best national-scale susceptibility model for China as it produced the most plausible
portray of rainfall induced landslide susceptibility and the highest spatially explicit predictive performance
(median AUROC of spatial cross validation 0.84) compared to the other two models (median AUROCs of 0.81 and 0.79, respectively). We conclude that ignoring landslide inventory-based incompleteness
can entail misleading modelling results and that the application of non-linear mixed-effect models
can reduce the propagation of such biases into the final results for very large areas.