Statistical analysis of communication services market in Russia
Procedure for the simulation of the advances in EGE from mathematics is considered. For some tasks the important predictors are obtained. The models of binary logistics regression and ordinal regression for the prediction of probabilities of solution of task are built.
The article studies educational trajectories of schoolchildren in Yaroslavl Oblast. Conclusions point out that schoolchild’s educational achievements, educational plans of his/her friends and the level of education of his/her father are key predictors of a decision about continuing education. Thanks to this information it is possible to know which schoolchildren are at risk of not continuing their studies. In the course of the research comparative advantages of the logistic regression and the discriminant analysis in the case of binary dependent variables were examined. With the necessary prerequisites for the use of methods fulfilled, both strategies work well classifying schoolchildren.
Classical change-point detection procedures assume a change-point model to be known and a change consisting in establishing a new observations regime, i.e. the change lasts infinitely long. These modeling assumptions contradicts applied problems statements. Therefore, even theoretically optimal statistics in practice very often fail when detecting transient changes online. In this work in order to overcome limitations of classical change-point detection procedures we consider approaches to constructing ensembles of change-point detectors, i.e. algorithms that use many detectors to reliably identify a change-point. We propose a learning paradigm and specific implementations of ensembles for change detection of short-term (transient) changes in observed time series. We demonstrate by means of numerical experiments that the performance of an ensemble is superior to that of the conventional change-point detection procedures.
In this study a CHAID-based approach to detecting classification accuracy heterogeneity across segments of observations is proposed. This helps to solve some important problems, facing a model-builder: (1) How to automatically detect segments in which the model significantly underperforms? and (2) How to incorporate the knowledge about classification accuracy heterogeneity across segments to partition observations in order to achieve better predictive accuracy? The approach was applied to churn data from the UCI Repository of Machine Learning Databases. By splitting the data set into four parts, which are based on the decision tree, and building a separate logistic regression scoring model for each segment we increased the accuracy by more than 7 percentage points on the test sample. Significant increase in recall and precision was also observed. It was shown that different segments may have absolutely different churn predictors. Therefore such a partitioning gives a better insight into factors influencing customer behavior.