Автоматическое извлечение текстовых и числовых веб-данных для целей социальных наук

С. В. Жучкова; А. Н. Ротмистров

?

Автоматическое извлечение текстовых и числовых веб-данных для целей социальных наук

Социология: методология, методы, математическое моделирование. 2020. № 50-51. С. 141–183.

The paper is devoted to the procedures of automatic data extraction from web pages, i.e., web scraping of web data. We consider different types of web data such as digital traces and other numeric and text web data as well as its advantages (the speed of data collection and, as a consequence, the continuous coverage, efficiency, etc.) and limitations (the limited representativeness, difficulties in organizing storage of a large amount of data, deviation from the traditional procedure for setting up a study, etc.) in comparison with traditional methods of data collection. Various tools of web data extraction (API, requests, and selenium) are described to illustrate principles of handling static and dynamic web pages. The paper also gives an overview of the basic minimum of competencies for web scraping: in particular, programming using Python and navigating through the web pages’ code. A detailed illustration is given based on a fragment of the data collection process from a recent relevant Russian study.

How Universal is the Cool Water Effect? Evidence from the Unlikely Case of Russia

Kravtsova M., Musaev A. U., Welzel C., / Series "SSRN Working Paper Series". 2026.

Elaborating on Welzel et al.'s "Cool Water Theory," our study zooms into the more limited (albeit still varied) framework conditions of Russia's huge territory. Within Russia's confines, we examine how the combination of moderately cool seasons with steady rain (i.e., Cool Water) affects sub-national areas' contemporary societal progress in two modernization indicators: material prosperity in ...

Added: June 3, 2026

Почему растущие доходы не делают людей счастливее: эмоциональное объяснение парадокса Истерлина (Why Growing Incomes Do Not Make People Happier: an Emotional Explanation of the Easterlin Paradox)

Vorchik A., / SSRN. Серия Social Science Research Network "Social Science Research Network". 2026.

This work is devoted to a theoretical explanation of the Easterlin paradox, according to which long-term economic growth does not make average level of people's happiness increasing. By happiness, we mean the intensity of emotions people experience while comparing their new income with its expected value, or the target income with its original value. In the first case, ...

Added: May 31, 2026

Determinants of Сonsent to Personal Data Surveillance: Experimental Evidence from Russia

Sizov A., Rodionova M., Sedashov E. et al., / NRU Higher School of Economics. Series PS "Political Science". 2026. No. 1.

Rapid development of surveillance technologies is one of the most socially important consequences of the digital age. This paper investigates the factors determining consent to surveillance of various types of personal data and contributes to rapidly growing research on citizens perceptions of surveillance practices. Relying on a comprehensive survey experiment, we study the effects of ...

Added: May 15, 2026

Оборот цифровых активов: вызовы, возможности и правовые рамки

Panarina M., Законодательство 2026 № 5 С. 16–23

Автор рассматривает актуальные проблемы регулирования оборота цифровых активов в Российской Федерации, обращает внимание на причины значительных ограничений в сфере судебной защиты прав владельцев цифровых активов, отмечает правовые коллизии и пробелы. По ее мнению, лишь дальнейшее развитие законодательства позволит более четко определить и регламентировать использование цифровых активов в соответствии с требованиями оборота, а также обеспечить защиту ...

Added: May 14, 2026

Домашние питомцы и здоровье пожилых. Количественный анализ

Карцева М. А., Peresetsky A., / Высшая школа экономики. Серия WP2 "Количественный анализ в экономике". 2026. № WP2/2026/01.

This study examines the association between health status among elderly individuals living alone and pet ownership (cats, dogs). We employ data from the “Time Use Survey” conducted by the Federal State Statistics Service (Rosstat) in 2019, which contains information on more than 10,000 elderly individuals living alone aged 60 or older in Russia, including data ...

Added: May 8, 2026

Балканские войны 1912–1913 гг. в современных национальных СМИ Сербии как символ единения балканских народов

Мулина А. А., В кн.: Балканские войны 1912–1913 гг.: далекие предпосылки и долгое эхо.: М.: Институт славяноведения РАН, 2024. С. 287–297.

В данной статье рассматривается вопрос отражения событий 1912–1913 гг. в национальных СМИ Сербии в 2012–2013 и 2022–2023 гг. Опираясь на «большие данные», полученные из сервиса Google, а также на материалы качественной газеты «Политика», автор анализирует особенности освещения эпизодов Балканских войн, а также запросы пользователей интернета на территории Сербии по темам, связанным с событиями 1912–1913 гг. ...

Added: April 21, 2026

Президентские выборы в Турецкой Республике в информационном пространстве стран Балканского полуострова: медиагеографический анализ

Мулина А. А., Якова Т. С., Вестник Российского университета дружбы народов. Серия: Литературоведение, журналистика 2025 Т. 30 № 1 С. 161–171

The article presents the results of a study of the information space of the Balkan states conducted during the presidential elections in Turkey (2023): the authors referred to this period as one of the most striking political events in the country over the past five years. The purpose of the proposed work is to identify ...

Added: April 21, 2026

Big Data как актив: задачи правового обеспечения оборота данных средствами публичного права

Лескина Э. И., Законодательство 2026 № 2 С. 22–29

One of the hallmarks of big data is its value, which stems from the essence of the current stage of societal development and the importance of information and data. However, without legal support for the economic nature of data, realizing its inherent potential becomes impossible. Currently, the existing legal framework for utilizing this asset in ...

Added: April 13, 2026

РАЗВИТИЕ НАЛОГОВОГО АДМИНИСТРИРОВАНИЯ В РОССИИ В УСЛОВИЯХ ПРИМЕНЕНИЯ ТЕХНОЛОГИИ БОЛЬШИХ ДАННЫХ

Lyutova O. I., Горбунова М. А., Вопросы государственного и муниципального управления 2026 № 1 С. 35–57

Использование больших данных в налоговом администрировании переходит от внедрения отдельных цифровых технологий к этапу качественной аналитики с использованием алгоритмов автоматического анализа значительных по объему массивов информации из различных источников, что порождает ряд системных вызовов. Задача исследования – выяснить и проанализировать состояние трансформации налогового администрирования, осуществляемой посредством внедрения и использования цифровых инструментов, главным образом – технологий ...

Added: April 7, 2026

Институт аналогии в информационном праве России

Лескина Э. И., Вестник Воронежского государственного университета. Серия: Право 2025 № 4(63) С. 157–165

The speed of development and distribution of digital technologies increases every year, as do the areas of their application. Artificial intelligence systems solve creative problems, predictive analytics are used in law enforcement agencies, and such areas as healthcare, transport, education and many other areas do not go unnoticed. At the same time, part of public ...

Added: April 1, 2026

Политические эффекты государственных цифровых платформ и сервисов в автократиях

Balayan A. A., Томин Л. В., Публичная политика 2023 Т. 7 № 1-2 С. 108–117

The paper is devoted to the study of certain aspects of the digitalization of public administration in autocracies, primarily government platforms and digital services. The analysis of the political effects of government platforms and services is carried out in the broader context of the study of new cybernetic mode of governance that complement/transform the disciplinary ...

Added: March 31, 2026

Историческая политика в межпартийной борьбе в современной Индии: трансформация образа Индиры Ганди в 2014-2019 гг.

Анташева Мария Сергеевна, Вестник Российского университета дружбы народов. Серия: Всеобщая история 2026 Т. 18 № 1 С. 44–58

The perception of key leaders in Indian modern history and its transformation within the contemporary political landscape constitutes a significant element of the political struggle between the country’s two major parties — the Indian National Congress (INC) and the Bharatiya Janata Party (BJP). Since coming to power, the BJP has consistently pursued a strategy of ...

Added: March 19, 2026

Цифровое общество: теоретическая модель и российская действительность

Smirnov A., Мониторинг общественного мнения: Экономические и социальные перемены 2021 № 1 С. 129–153

The article considers a theoretical model of digital society based on four concepts: super-connectivity, platformisation, datafication, and algorithmic governance. The model describes how the digitalisation of society deepens: from the transfer of individual practices and social interactions to a new social order based on big data. Analysis of panel data from the 2003–2018 longitudinal survey ...

Added: March 18, 2026

Прогнозирование миграционных процессов методами цифровой демографии

Smirnov A., Экономика региона 2022 Т. 18 № 1 С. 133–145

The nature and intensity of migration processes are constantly changing. Demographic statistics are not suitable for obtaining up-to-date information and making timely decisions in the field of demographic and social policy. Thus, digital demography is becoming increasingly important, as this area of population research uses new methods and data sources resulting from the Internet expansion ...

Added: March 18, 2026

Загадка внутренней мотивации

Vorchik A., / Social Science Research Network. Серия SSRN Working Paper Series "SSRN Working Paper Series". 2026.

This article is devoted to the phenomenon of intrinsic motivation, to understand which two models are proposed. We study how positive/negative intrinsic motivation to work (experienced utility) affects worker's individual labour supply (model I) and the amount of effort they exert (model II). In model I, we use intrinsic motivation to explain the positive/negative slope ...

Added: March 15, 2026

Improving guest satisfaction by identifying hotel service micro-elements failures through Deep Learning of online reviews

Kazakov S., Cuesta-Valiño P., Butkovskaya V. et al., Cuadernos de Gestion 2025 Vol. 25 No. 1 P. 71–88

This study provides an in-depth examination of often-overlooked hotel service micro-elements within the broader spectrum of hospitality services, with the aim of improving service delivery and enhancing guest satisfaction. To achieve this, we develop a methodological framework that integrates: (a) VADER text-based sentiment analysis, (b) a robust logistic regression procedure to identify the specific hotel ...

Added: February 28, 2026

Data Analytics for Predicting Situational Developments in Smart Cities: Assessing User Perceptions

Kharlamov A. A., Pilgun M., , in: Special Issue Sensing Technology for Smart Cities: Data, Analytics, and VisualizationsVol. 24. Issue 15.: [б.и.], 2024.

The analysis of large volumes of data collected from heterogeneous sources is increasingly important for the development of megacities, the advancement of smart city technologies, and ensuring a high quality of life for citizens. This study aimed to develop algorithms for analyzing and interpreting social media data to assess citizens’ opinions in real time and ...

Added: February 22, 2026

Special Issue Sensing Technology for Smart Cities: Data, Analytics, and Visualizations

[б.и.], 2024.

Nowadays a huge portion of population lives in urban areas, and projections indicate that most cities are going to be confronted with a growing urban population in the next few years. This undoubtably poses new challenges that must be addressed by city councils and stakeholders to guarantee citizens’ high quality of life. Mobility, pollution, climate ...

Added: February 15, 2026

Microfoundations of the Cultural Modernization Theory

Musaev A. U., Vorchik A., / Series Social Science Research Network "Social Science Research Network". 2026.

This paper attempts to model the evolutionary theory of modernization and democratization. The model reflects the key provisions of R. Inglehart and C. Welzel's theory and provides a microfoundation for the adaptation of subjective values to the objective importances of the survival factors and the structure of the labour markets from the perspective of evolutionary ...

Added: February 10, 2026

ALGORITHMIZATION OF LAW ENFORCEMENT MANAGEMENT PROCESSES USING ARTIFICIAL INTELLIGENCE

Barchukov, V., Relacoes Internacionais no Mundo Atual 2024 Vol. 4 No. 46 P. 113–132

Objective: Despite the opportunities that are opening up due to the development of information support systems and artificial intelligence in law enforcement, unfortunately, the Russian Federation has not yet fully formed a scientifically based legal and organizational framework for their integrated and practical application in activities of law enforcement agencies. The article aims to assess ...

Added: January 20, 2026

Artificial Intelligence for Urban Planning and Building Smart Cities

Demekhina A., Milshina Y., , in: Artificial Intelligence Enabled Real Time Environmental Monitoring.: Springer, 2026. P. 253–281.

Added: January 13, 2026

Denomination, Religiosity and Anti-Immigrant Attitudes in Europe:Comparative Evidence from the European Social Survey

Dorkhanov I., Sokolov B., / Series OSF "SocArXiv". 2025.

This study investigates the relationship between individual religiosity and attitudes towards immigrants of different religious backgrounds in Europe. Using data from the 7th wave of the European Social Survey (2014-2015), we examine the influence of individual denomination and subjective religiosity level on hostility towards Muslim immigrants and the importance of immigrants’ Christian background. Our analysis, ...

Added: December 23, 2025

Classification Approach to Mapping Cultural Differences: An Illustration Using Survey Data from 60 Russian Regions

Nastina E., Sokolov B., / Series OSF "SocArXiv". 2025.

We argue that a classification-based approach to measuring cultural differences across countries or subnational regions is a promising complement, and sometimes an alternative, to the widely used dimensional method in cross-cultural research. The latter summarises cultural variation using continuous dimensions, for example, Hofstede’s famous individualism-collectivism dimension. However, this approach relies on strong parametric assumptions, which are ...

Added: December 23, 2025

Cross-Nationally, Non-Probability Web Surveys Demonstrate Poorer DemographicCoverage and Yield More Liberal Estimates of Public Opinion than F2F Surveys

Korsunava V., Sokolov B., / Series OSF "SocArXiv". 2025.

Non-probability web surveys offer several advantages over face-to-face (F2F) interviews—they are cheaper, faster, more accessible, and reduce interviewer effects and desirability bias. As such, they are increasingly popular in both academic and commercial research. However, they often yield demographically biased samples, raising concerns about the accuracy of the resulting public opinion estimates. Most studies on ...

Added: December 23, 2025