HSE Students Win Awards at the Kaggle International Data Science Competition
Ekaterina Melianova and Artyom Volgin, second-year students of the Master’s programme ‘Applied Statistics with Network Analysis’, took second place in an international data analysis competition. Using a Kaggle survey of 19,717 respondents from 171 countries, they analyzed the community of PhD degree holders in Data Science.
Kaggle is a Data Science platform by Google. The community brings together about 3,000,000 machine learning and data science professionals from all over the globe. The resource publishes learning materials and organizes surveys and online competitions. The platform has hosted over a hundred open machine learning competitions, with prizes totaling tens of thousands of dollars.
Participants of the annual Kaggle ML & DS Survey were asked to analyze the data from an online survey of Kaggle website users. After selecting a group from the survey, they then had to use the data to craft an interesting story about the respondents. The jury assessed the storytelling and project originality, as well as the clarity of the code and reproducibility of the results.
Ekaterina Melianova
We chose respondents with a PhD as the subject of our research. We are interested in the topic since we study issues related to the effectiveness of human capital, and, in particular, education. Most of the survey data was about specific skills in data analysis the respondents possess, such as Python programming or knowledge of certain machine learning methods.
We used these answers to calculate the similarity between respondents and constructed a graph that we then used to draw some interesting conclusions about the traits of the academic data science community. With the help of this method, we managed to define certain clusters within the PhD community, examine differences in skills between groups from different countries, and determine which skills are fundamental and which are more specialized.
We also used network analysis to visualize the results in an interesting way. In addition, we demonstrated how advantageous or disadvantageous with regard to salary getting a PhD is in different countries, and also, how the existing gender discrimination in data science professions affects women with a PhD.
Ekaterina and Artyom originally chose the Master’s programme ‘Applied Statistics with Network Analysis’ because of their interest in applied data analysis. The programme will help them master a wide range of different statistical methods, including network analysis, which is extremely popular in different research fields today.