When data sets contain values that are missing, applying standard techniques without accounting for the missingness properly may yield biased or invalid results, and thus missing values are often imputed before analysis. Compositional data, commonly defined as vectors of proportions that sum to one, appear in nearly every field of study. Due to their inherent constraints, specialized methods are required when imputing missing values in compositional data sets.
We present a novel non-parametric method to impute missing values in compositional data sets based on a computationally efficient k- nearest neighbour procedure which, in turn, employs the Jensen-Shannon divergence. The method makes no assumptions concerning the structure of the data and allows for essential zeros in the non-missing parts, which are traditionally problematic in compositional data. We demonstrate through a simulation study using real-life data sets that, compared to competing methods, our approach is more accurate for a variety of settings.
Zoom link: https://uoc-gr.zoom.us/j/88659969718?pwd=g6bjYPDCuUQo1bzVxjjbgQL4xFN1f3.1