Сеть или текст? Факторы распространения протеста в социальных медиа: теория и анализ данных
Social media can act as environments that accumulate and concentrate protest sentiment before it brings people to the streets. The social ties that connect people online are similar to their offline ties, and their structure can affect the diffusion of both the protest-related information and the protest itself. In addition, social media can serve as core platforms or environments for articulating collective goals and identities. This article builds on previous scholarship that has developed these ideas, and extends it with an empirical analysis of the Venezuelan Twittersphere during the political unrest in that country.
Short messages, a.k.a. tweets, are the basic building blocks of online protest behavior on Twitter. Some of these tweets get virally retweeted and can achieve very broad audiences. These viral tweets are arguably of key importance for the articulation of the protest sentiment.
But what kind of a tweet tends to become viral? Is it a tweet posted by someone with a fortunate position in the social media network, or the one that stands out as particularly catchy or emotional? We formalize and test these competing hypotheses using two groups of empirically observable features characterizing either the author of a tweet or its content. The first group of features includes the average number of followers the users who posted a retweet have, the total number of followers the author of the original tweet has, whether the author or those who retweet are verified Twitter users, etc. The other group describes the content of the tweet and includes binary indicators of whether the tweet contains links to external platforms, emojis, question or exclamation marks. The dependent variable is the total number of retweets.
We analyze over 5.7 million unique tweets using modern data science approaches and methods (e.g. a LASSO-regression model, cross-validation, etc.) and find that the first-group features are much more informative for modeling the dependent variable. This finding turned out to be very robust and holds for both OLS and LASSO models. In addition, given the increasing importance that social media bots – i.e. automated accounts that are able to post retweet, among other things – have recently gained for political communication, we also performed robustness checks by removing bots from the analysis. We find that the network characteristics matter more than the content-related features under study.