Imputation of complex dependent data: a copula-based approach
Di Lascio FML
MetadataShow full item record
Missing data occur in almost all the surveys and data collections. In risk management, for example, an institution might not have enough data to estimate risk components, like the probability of default, and some reconstruction methods should be used. Handling missing data requires resorting to imputation methods since restricting the analysis to complete cases leads to loss of precision and invalid inferences . The choice of the most appropriate imputation method depends on many elements. We present an imputation method that can be used when the focus in on the multivariate dependence structure of the data generating process. The method, called CoImp [1, 2], is based on the copula function  and makes it possible to impute multivariate missing data with generic patterns and complex dependence structure. The CoImp is a stochastic single imputation method and employs conditional density functions of the missing variables given the observed ones to fill in each missing (multivariate) value. These functions can be derived analytically once parametric models for the margins and the copula are specified. When analytical derivations are not feasible, the margins are estimated non-parametrically through local likelihood methods . We describe both the analytic and the semiparametric version of the copula-based imputation method and investigate their performance in terms of preservation of both the dependence structure and the microdata through Monte Carlo studies. Moreover, the method has been implemented and made available through the R package CoImp . We provide an illustration of how to handle the imputation through the R package, i.e. a description of its main functions, their output and usage on real data sets.