Возможность работы с пропущенными данными при использовании CHAID: результаты статистического эксперимента
The paper is addressed to an approach to working with a missing data "as is". I.e. it is supposed that missing data becomes one more category of the exploring variable. Such an approach to working with missings is radically different from alternative approaches: they are to delete those observations which contain missings or replace missings with valid data. The only method known to us which makes it possible to implement the approach of working with missings "as is" is CHAID. CHAID refers to the decision trees class of methods; in itself, this method is very interesting and relevant for researchers dealing with categorical variables and nonlinear associations.
In the literature, we did not find an answer to the question what are the advantages and limitations of the approach to working with missings "as is" implemented in CHAID comparing to the mentioned alternative approaches. Despite this, tree models with missing data are often found in empirical studies. To start a discussion considering this issue, we conducted several series of statistical experiments on generated data organized into three predictors of categorical and interval measure type. It was empirically established that, on the whole, the method correctly distributes missings in tree's nodes, but in most cases, the inclusion of missings in an analysis is accompanied by changes in tree's structure, and therefore there is a risk of obtaining incorrect, false, erroneous conclusions. The paper also provides recommendations on what factors should be considered when deciding whether to include missing in an analysis "as is".