ПРЕДСКАЗАНИЕ ЧАСТИЧНОГО НЕОТВЕТА НА ПРИМЕРЕ ДАННЫХ EUROPEAN SOCIAL SURVEY С ИСПОЛЬЗОВАНИЕМ ЛОГИСТИЧЕСКОЙ РЕГРЕССИИ
Missing data represent an urgent problem in sociological research. One of the sources of the missing data is an item nonresponse, which can be related to the respondent’s reluctance to answer the question, difficulties that occur during the answering process, or other reasons. The reason for the nonresponse is seen in the method of conducting the survey or in the characteristics of the respondents, and also in the characteristics of the questionnaire itself. This research will show how item nonresponse can be predicted by logistic regression model using European Social Survey data (ESS). Models for predicting rejection answer, no answer, and “don’t know” option were trained based on the textual characteristics of the questions using word frequencies and the word importance metric TF-IDF. All the models obtained were compared with each other in terms of the quality of the predictions can be made with them, in addition, the most important words from questions were divided as to whether they increase or decrease the likelihood of an item nonresponse. In particular, it was revealed that words connected to the sensitive topics lead to an increase in the proportion of an item nonresponse, as well as some words connected to the instruction on how to answer particular question.