Predicting PISA Scores from Students’ Digital Traces
The Programme for International Student Assessment (PISA) is an influential worldwide study that tests the skills and knowledge in mathematics, reading, and science of 15-yearold students. In this paper, we show that PISA scores of individual students can be predicted from their digital traces. We use data from the nationwide Russian panel study that tracks 4,400 participants of PISA and includes information about their activity on a popular social networking site. We build a simple model that predicts PISA scores based on students’ subscriptions to various public pages on the social network. The resulting model can successfully discriminate between low- and high-performing students (AUC = 0.9). We find that top-performing students are interested in pages related to science and art, while pages preferred by low-performing students typically concern humor and horoscopes. The difference in academic performance between subscribers to such public pages could be equivalent to several years of formal schooling, indicating the presence of a strong digital divide. The ability to predict academic outcomes of students from their digital traces might unlock the potential of social media data for large-scale education research.