Abstract
Imputation of missing data is a crucial task in statistical practice and the problem poses several challenges, like data sets with variables of different type, non-Gaussian data or large data sets. An approach based on conditional copulas that allows us to avoid most of these issues is proposed.
The basic idea is to derive the conditional densities for each incomplete variable given the complete ones through the corresponding conditional copulas and, then, impute missing values by drawing observations from them. The proposal can be applied to multivariate missing with generic (monotone or non-monotone) dependence pattern. Also, an R software package, called CoImp, that implements the method is provided. The CoImp aims to impute missing data by preserving their possible complex dependence structure. It is appealing since it does not restrict the conditional distributions to being Gaussian and allows us to model multivariate distributions with different margins through a flexible semi-parametric approach. The advantages of the CoImp over classical imputation techniques are shown in a simulation study; also, applications to real data sets are presented.