Classification of Demographic Sequences Based on Pattern Structures and Emerging Patterns
This paper presents recent results of studies in application of sequence-based pattern structures and emerging patterns to analysis of demographic sequences in Russia. This study is performed on data of 11 generations from 1930 till 1984 for the panel of three waves of the Russian part of Generation and Gender Survey, which took place in 2004, 2007, and 2011. The main goal is to develop methods of extracting emerging patterns (EP) with the following restrictions: the obtained patterns need to be (closed) frequent contiguous prefixes of the input sequences. These constraints were required by demographers for proper interpretation and understanding of early life course events that lead to adulthood. To fulfil this requirement we used modified FP-trees based on pattern structures of contiguous prefixes. After extraction of EP we use CAEP(Classification by Aggregating Emerging Patterns) classifier to predict gender of respondents using their demographic sequences of the first life course events. The best results in terms of TPR-FPR have been obtained for large values of minimum growth-rate parameter (with some objects left without classification).