Educational Migration from Russia to China: Social Network Data
This paper presents the results of our study of educational migration flows between Russian Federation and China. Using data from the most popular among Russian-speakers Social Networking Site VK, we explore "digital footprints" of migration, analyzing the factors influencing the size of migration flows from different Russian cities to China. We take into account different groups of parameters, in particular, geographic proximity of a city to China and to Russian educational centers, institutional presence of China, and Chinese web presence in the particular city. Resulting conditional inference tree with the relative number of educational migrants from each city as the outcome has R2 = .86
This is a collection of scientific papers on migration studies.
The chief aim of this paper is to analyse dynamics of linear and non-linear methods to predict bankruptcy for Russian private small and medium-sized retail and wholesale trade companies. We use financial and non-financial data prior and subsequent to the economic crisis of 2008—2009. We use the following methods: logistic regression and random forest.
This research will be of vital importance especially to banks and other credit organisations providing loans to small and medium businesses.
Our dataset comprises from 200,000 to 600,000 companies depending on specific year. We use data from the Ruslana database which covers the period from 2004 to 2012.
The definition of default is extended to financial difficulties by adding voluntary liquidated firms to those liquidated as a result of legal bankruptcy. We study active companies and two types of liquidated ones.
Heterogeneity of Russian companies is taken into account in several ways. In addition to financial ratios derived from financial statements we include non-financial variables such as regional distribution, age, size and legal form into statistical models.
Evaluation of the prediction performance is done with the help of out-of-sample forecasts. We obtain models with quite high predictive power, area under ROC curve reaches 0.75. Random forest outperformed logit-model. Adding non-financial information such as age and federal region leads to the improved forecasts while legal form and size do not have a great impact on the outcome. Among financial measures liquidity, profitability and leverage ratios turned out to be essential. Moreover, our models captured a structural change which was likely to be caused by the crisis of 2008—2009.
The proceedings contain 65 papers. The topics discussed include: understanding political turbulence: the data science of politics; why we post - the comparative anthropology of social media; applying machine learning to ads integrity at Facebook; large-scale analytics of dynamics of choice among discrete alternatives; privacy and internet governance; computational social sciences: a bricolage of approaches; community detection: from plain to attributed complex networks; utilizing online qualitative methods for web science; likeology: modeling, predicting, and aggregating likes in social media; understanding video-ad consumption on YouTube: a measurement study on user behavior, popularity, and content properties; talking climate change via social media: communication, engagement and behavior; and spreading the news: how can journalists gain more engagement for their tweets?.
In this paper we explore main patterns of communication and cooperation in online groups created by residents of apartment buildings in St.Petersburg in social networking site “VK”. Using word-frequency analysis and Latent Dirichlet Allocation (LDA) we discovered main discussion topics in online groups. We have also found that communication of neighbors in these groups is predominantly connected with material needs and directed to solve common problems, such as related to building improvement, management company and in-fill constructions near their house. Based on online observations of city activists, we suggest that dynamic nature of SNS allows online community which is dedicated to particular problem to avoid it’s breakdown after the resolution of the original issue.
In this paper, we summarize the results of recent studies on the application of pattern mining and machine learning to the analysis of demographic sequences. The main goal is the demonstration of demographers’ needs, including next-event prediction and the extraction of interesting patterns from substantial datasets of demographic data, which cannot be handled by conventional demographic techniques. We use decision trees as a technique for demographic event prediction, and emerging sequential patterns and pattern structures for discovering relevant interpretable sequences. The emerging problem statements and positive prospects of the usage of pattern mining in the demography domain are worth dissemination in the data mining community.
To the best knowledge of authors, the use of Random forest as a potential technique for residential estate mass appraisal has been attempted for the first time. In the empirical study using data on residential apartments the method performed better than such techniques as CHAID, CART, KNN, multiple regression analysis, Artificial Neural Networks (MLP and RBF) and Boosted Trees. An approach for automatic detection of segments where a model significantly underperforms and for detecting segments with systematically under- or overestimated prediction is introduced. This segmentational approach is applicable to various expert systems including, but not limited to, those used for the mass appraisal.