Классификация демографических последовательностей на основе узорных структур
his paper presents the first results of studies in application of sequence-based pattern structures and emerging patterns to analysis of demographic sequences in Russia. This study is performed on data of 11 generations from 1930 till 1984 for the panel of three waves of the Russian part of Generation and Gender Survey, which took place in 2004, 2007, and 2011. The main goal is to develop methods of extracting emerging patterns (EP) with the following restrictions: the obtained patterns need to be (closed) frequent gapless prefixes of the input sequences. These constraints were required by demographers since it is necessary for proper interpretation and understanding of early life course events that lead to adulthood. To solve this problem, we used pattern structures of gapless prefixes and modified FP-trees. After extraction of EP we use CAEP classifier to predict gender of respondents using their demographic sequences of the first life course events. The best results in terms of TPR-FPR have been obtained for large values of minimum growth-rate parameter (with some objects left without classification).
The paper was prepared within the framework of the Academic Fund Program at HSE in 2016 (grant № 16-05-0011 ``Development and testing of demographic sequence analysis and mining techniques'') and supported within the framework of a subsidy granted to the HSE by the Government of the Russian Federation for the implementation of the Global Competitiveness Program.