Educational Migration from Russia to China: Social Network Data
This paper presents the results of our study of educational migration flows between Russian Federation and China. Using data from the most popular among Russian-speakers Social Networking Site VK, we explore "digital footprints" of migration, analyzing the factors influencing the size of migration flows from different Russian cities to China. We take into account different groups of parameters, in particular, geographic proximity of a city to China and to Russian educational centers, institutional presence of China, and Chinese web presence in the particular city. Resulting conditional inference tree with the relative number of educational migrants from each city as the outcome has R2 = .86
This is a collection of scientific papers on migration studies.
The chief aim of this paper is to analyse dynamics of linear and non-linear methods to predict bankruptcy for Russian private small and medium-sized retail and wholesale trade companies. We use financial and non-financial data prior and subsequent to the economic crisis of 2008—2009. We use the following methods: logistic regression and random forest.
This research will be of vital importance especially to banks and other credit organisations providing loans to small and medium businesses.
Our dataset comprises from 200,000 to 600,000 companies depending on specific year. We use data from the Ruslana database which covers the period from 2004 to 2012.
The definition of default is extended to financial difficulties by adding voluntary liquidated firms to those liquidated as a result of legal bankruptcy. We study active companies and two types of liquidated ones.
Heterogeneity of Russian companies is taken into account in several ways. In addition to financial ratios derived from financial statements we include non-financial variables such as regional distribution, age, size and legal form into statistical models.
Evaluation of the prediction performance is done with the help of out-of-sample forecasts. We obtain models with quite high predictive power, area under ROC curve reaches 0.75. Random forest outperformed logit-model. Adding non-financial information such as age and federal region leads to the improved forecasts while legal form and size do not have a great impact on the outcome. Among financial measures liquidity, profitability and leverage ratios turned out to be essential. Moreover, our models captured a structural change which was likely to be caused by the crisis of 2008—2009.
Nowadays, most of the people are suffering from the attack of chronic diseases because of their lifestyle, food habits, and reduction in physical activities. Diabetes is one of the most common chronic diseases being suffered by the people of all ages. As a result, the healthcare sector is generating extensive data containing huge volume, enormous velocity, and a vast variety of heterogeneous sources. In such scenario, scientific solutions offer to harness these massive, heterogeneous and complex datasets to obtain more meaningful information. Moreover, machine learning algorithms can play a tremendous part in creating a statistical prediction-based model. The aim of this paper is to identify the prevalence of diabetes related to long-term complications among patients with type-2 diabetes mellitus. The processing and statistical analysis require machine learning environment known as Scikit-Learn, Pandas for Python, and R-Studio for R. In this work, machine learning approaches such as decision tree, random forest for developing classification system-based prediction model to assess type-2 diabetes mellitus chronic diseases have been studied. Additionally, we have proposed an algorithm which is solely based on random forest and tried to detect the complicated areas of type-2 diabetes patients.
The proceedings contain 65 papers. The topics discussed include: understanding political turbulence: the data science of politics; why we post - the comparative anthropology of social media; applying machine learning to ads integrity at Facebook; large-scale analytics of dynamics of choice among discrete alternatives; privacy and internet governance; computational social sciences: a bricolage of approaches; community detection: from plain to attributed complex networks; utilizing online qualitative methods for web science; likeology: modeling, predicting, and aggregating likes in social media; understanding video-ad consumption on YouTube: a measurement study on user behavior, popularity, and content properties; talking climate change via social media: communication, engagement and behavior; and spreading the news: how can journalists gain more engagement for their tweets?.
In this paper, we summarize the results of recent studies on the application of pattern mining and machine learning to the analysis of demographic sequences. The main goal is the demonstration of demographers’ needs, including next-event prediction and the extraction of interesting patterns from substantial datasets of demographic data, which cannot be handled by conventional demographic techniques. We use decision trees as a technique for demographic event prediction, and emerging sequential patterns and pattern structures for discovering relevant interpretable sequences. The emerging problem statements and positive prospects of the usage of pattern mining in the demography domain are worth dissemination in the data mining community.
To the best knowledge of authors, the use of Random forest as a potential technique for residential estate mass appraisal has been attempted for the first time. In the empirical study using data on residential apartments the method performed better than such techniques as CHAID, CART, KNN, multiple regression analysis, Artificial Neural Networks (MLP and RBF) and Boosted Trees. An approach for automatic detection of segments where a model significantly underperforms and for detecting segments with systematically under- or overestimated prediction is introduced. This segmentational approach is applicable to various expert systems including, but not limited to, those used for the mass appraisal.