• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • Автоматическое извлечение текстовых и числовых веб-данных для целей социальных наук
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 30, 2026
HSE Economists Reveal How the Wage Gap Emerges Among Vocational School Graduates
HSE researchers examined the careers of 600,000 graduates of Russian secondary vocational education programmes and found that at the start of their careers, the gender wage gap reaches 23%, doubling after three years. This disparity is largely due to male and female students choosing different occupations when enrolling in vocational schools. These were the findings made by Sergey Roshchin, Natalya Yemelina, and Ksenia Rozhkova from of the HSE Faculty of Economic Sciences. The article has been published in Educational Studies.
June 25, 2026
HSE Researchers Make Aldehydes Perform Dual Function
Chemists from HSE University have discovered a way to carry out a reductive addition reaction without using an external reducing agent. Instead, the required 'resource' is supplied by the aldehyde itself, one of the reaction participants. This approach helps prevent unwanted side reactions, reduces toxicity, and simplifies the production and synthesis of organic molecules, including those used in the manufacture of medicines. The study has been published in Journal of Catalysis.
June 25, 2026
HSE Scientists Explain Why Findings in Autism Research Differ
Researchers from the Cognitive Health and Intelligence Centre at HSE University conducted the first-ever systematic review of studies on the specifics of emotion-from-motion perception in autism. The review showed that differences found between autistic and non-autistic individuals are largely associated with the experimental design and the types of tasks given to study participants. The review findings have been published in Research in Autism.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Автоматическое извлечение текстовых и числовых веб-данных для целей социальных наук

Социология: методология, методы, математическое моделирование. 2020. № 50-51. С. 141–183.
Zhuchkova S., Rotmistrov A.

The paper is devoted to the procedures of automatic data extraction from web pages, i.e., web scraping of web data. We consider different types of web data such as digital traces and other numeric and text web data as well as its advantages (the speed of data collection and, as a consequence, the continuous coverage, efficiency, etc.) and limitations (the limited representativeness, difficulties in organizing storage of a large amount of data, deviation from the traditional procedure for setting up a study, etc.) in comparison with traditional methods of data collection. Various tools of web data extraction (API, requests, and selenium) are described to illustrate principles of handling static and dynamic web pages. The paper also gives an overview of the basic minimum of competencies for web scraping: in particular, programming using Python and navigating through the web pages’ code. A detailed illustration is given based on a fragment of the data collection process from a recent relevant Russian study.

Priority areas: sociology
Language: Russian
Full text
Text on another site
Keywords: большие данныеAPIbig dataтекстовые данныеcomputational social scienceweb scrapingAPIrequeststext dataвычислительные социальные наукиautomatic data extractionweb dataавтоматическое извлечение данныхвеб-данныевеб-скрапингseleniumrequestsselenium
Similar publications
How Universal is the Cool Water Effect? Evidence from the Unlikely Case of Russia
Kravtsova M., Musaev A. U., Welzel C., / Series "SSRN Working Paper Series". 2026.
Elaborating on Welzel et al.'s "Cool Water Theory," our study zooms into the more limited (albeit still varied) framework conditions of Russia's huge territory. Within Russia's confines, we examine how the combination of moderately cool seasons with steady rain (i.e., Cool Water) affects sub-national areas' contemporary societal progress in two modernization indicators: material prosperity in ...
Added: June 3, 2026
Почему растущие доходы не делают людей счастливее: эмоциональное объяснение парадокса Истерлина (Why Growing Incomes Do Not Make People Happier: an Emotional Explanation of the Easterlin Paradox)
Vorchik A., / SSRN. Серия Social Science Research Network "Social Science Research Network". 2026.
This work is devoted to a theoretical explanation of the Easterlin paradox, according to which long-term economic growth does not make average level of people's happiness increasing. By happiness, we mean the intensity of emotions people experience while comparing their new income with its expected value, or the target income with its original value. In the first case, ...
Added: May 31, 2026
Determinants of Сonsent to Personal Data Surveillance: Experimental Evidence from Russia
Sizov A., Rodionova M., Sedashov E. et al., / NRU Higher School of Economics. Series PS "Political Science". 2026. No. 1.
Rapid development of surveillance technologies is one of the most socially important consequences of the digital age. This paper investigates the factors determining consent to surveillance of various types  of  personal  data and contributes  to  rapidly  growing  research  on  citizens  perceptions  of surveillance practices. Relying on a comprehensive survey experiment, we study the effects of ...
Added: May 15, 2026
Оборот цифровых активов: вызовы, возможности и правовые рамки
Panarina M., Законодательство 2026 № 5 С. 16–23
Автор рассматривает актуальные проблемы регулирования оборота цифровых активов в Российской Федерации, обращает внимание на причины значительных ограничений в сфере судебной защиты прав владельцев цифровых активов, отмечает правовые коллизии и пробелы. По ее мнению, лишь дальнейшее развитие законодательства позволит более четко определить и регламентировать использование цифровых активов в соответствии с требованиями оборота, а также обеспечить защиту ...
Added: May 14, 2026
Домашние питомцы и здоровье пожилых. Количественный анализ
Карцева М. А., Peresetsky A., / Высшая школа экономики. Серия WP2 "Количественный анализ в экономике". 2026. № WP2/2026/01.
This study examines the association between health status among elderly individuals living alone and pet ownership (cats, dogs). We employ data from the “Time Use Survey” conducted by the Federal State Statistics Service (Rosstat) in 2019, which contains information on more than 10,000 elderly individuals living alone aged 60 or older in Russia, including data ...
Added: May 8, 2026
Балканские войны 1912–1913 гг. в современных национальных СМИ Сербии как символ единения балканских народов
Мулина А. А., В кн.: Балканские войны 1912–1913 гг.: далекие предпосылки и долгое эхо.: М.: Институт славяноведения РАН, 2024. С. 287–297.
В данной статье рассматривается вопрос отражения событий 1912–1913 гг. в национальных СМИ Сербии в 2012–2013 и 2022–2023 гг. Опираясь на «большие данные», полученные из сервиса Google, а также на материалы качественной газеты «Политика», автор анализирует особенности освещения эпизодов Балканских войн, а также запросы пользователей интернета на территории Сербии по темам, связанным с событиями 1912–1913 гг. ...
Added: April 21, 2026
Президентские выборы в Турецкой Республике в информационном пространстве стран Балканского полуострова: медиагеографический анализ
Мулина А. А., Якова Т. С., Вестник Российского университета дружбы народов. Серия: Литературоведение, журналистика 2025 Т. 30 № 1 С. 161–171
The article presents the results of a study of the information space of the Balkan states conducted during the presidential elections in Turkey (2023): the authors referred to this period as one of the most striking political events in the country over the past five years. The purpose of the proposed work is to identify ...
Added: April 21, 2026
Big Data как актив: задачи правового обеспечения оборота данных средствами публичного права
Лескина Э. И., Законодательство 2026 № 2 С. 22–29
One of the hallmarks of big data is its value, which stems from the essence of the current stage of societal development and the importance of information and data. However, without legal support for the economic nature of data, realizing its inherent potential becomes impossible. Currently, the existing legal framework for utilizing this asset in ...
Added: April 13, 2026
РАЗВИТИЕ НАЛОГОВОГО АДМИНИСТРИРОВАНИЯ В РОССИИ В УСЛОВИЯХ ПРИМЕНЕНИЯ ТЕХНОЛОГИИ БОЛЬШИХ ДАННЫХ
Lyutova O. I., Горбунова М. А., Вопросы государственного и муниципального управления 2026 № 1 С. 35–57
Использование больших данных в налоговом администрировании переходит от внедрения отдельных цифровых технологий к этапу качественной аналитики с использованием алгоритмов автоматического анализа значительных по объему массивов информации из различных источников, что порождает ряд системных вызовов. Задача исследования – выяснить и проанализировать состояние трансформации налогового администрирования, осуществляемой посредством внедрения и использования цифровых инструментов, главным образом – технологий ...
Added: April 7, 2026
Институт аналогии в информационном праве России
Лескина Э. И., Вестник Воронежского государственного университета. Серия: Право 2025 № 4(63) С. 157–165
The speed of development and distribution of digital technologies increases every year, as do the areas of their application. Artificial intelligence systems solve creative problems, predictive analytics are used in law enforcement agencies, and such areas as healthcare, transport, education and many other areas do not go unnoticed. At the same time, part of public ...
Added: April 1, 2026
Политические эффекты государственных цифровых платформ и сервисов в автократиях
Balayan A. A., Томин Л. В., Публичная политика 2023 Т. 7 № 1-2 С. 108–117
The paper is devoted to the study of certain aspects of the digitalization of public administration in autocracies, primarily government platforms and digital services. The analysis of the political effects of government platforms and services is carried out in the broader context of the study of new cybernetic mode of governance that complement/transform the disciplinary ...
Added: March 31, 2026
Историческая политика в межпартийной борьбе в современной Индии: трансформация образа Индиры Ганди в 2014-2019 гг.
Анташева Мария Сергеевна, Вестник Российского университета дружбы народов. Серия: Всеобщая история 2026 Т. 18 № 1 С. 44–58
The perception of key leaders in Indian modern history and its transformation within the contemporary political landscape constitutes a significant element of the political struggle between the country’s two major parties — the Indian National Congress (INC) and the Bharatiya Janata Party (BJP). Since coming to power, the BJP has consistently pursued a strategy of ...
Added: March 19, 2026
Цифровое общество: теоретическая модель и российская действительность
Smirnov A., Мониторинг общественного мнения: Экономические и социальные перемены 2021 № 1 С. 129–153
The article considers a theoretical model of digital society based on four concepts: super-connectivity, platformisation, datafication, and algorithmic governance. The model describes how the digitalisation of society deepens: from the transfer of individual practices and social interactions to a new social order based on big data. Analysis of panel data from the 2003–2018 longitudinal survey ...
Added: March 18, 2026
Прогнозирование миграционных процессов методами цифровой демографии
Smirnov A., Экономика региона 2022 Т. 18 № 1 С. 133–145
The nature and intensity of migration processes are constantly changing. Demographic statistics are not suitable for obtaining up-to-date information and making timely decisions in the field of demographic and social policy. Thus, digital demography is becoming increasingly important, as this area of population research uses new methods and data sources resulting from the Internet expansion ...
Added: March 18, 2026
Загадка внутренней мотивации
Vorchik A., / Social Science Research Network. Серия SSRN Working Paper Series "SSRN Working Paper Series". 2026.
This article is devoted to the phenomenon of intrinsic motivation, to understand which two models are proposed. We study how positive/negative intrinsic motivation to work (experienced utility) affects worker's individual labour supply (model I) and the amount of effort they exert (model II). In model I, we use intrinsic motivation to explain the positive/negative slope ...
Added: March 15, 2026
Improving guest satisfaction by identifying hotel service micro-elements failures through Deep Learning of online reviews
Kazakov S., Cuesta-Valiño P., Butkovskaya V. et al., Cuadernos de Gestion 2025 Vol. 25 No. 1 P. 71–88
This study provides an in-depth examination of often-overlooked hotel service micro-elements within the broader spectrum of hospitality services, with the aim of improving service delivery and enhancing guest satisfaction. To achieve this, we develop a methodological framework that integrates: (a) VADER text-based sentiment analysis, (b) a robust logistic regression procedure to identify the specific hotel ...
Added: February 28, 2026
Data Analytics for Predicting Situational Developments in Smart Cities: Assessing User Perceptions
Kharlamov A. A., Pilgun M., , in: Special Issue Sensing Technology for Smart Cities: Data, Analytics, and VisualizationsVol. 24. Issue 15.: [б.и.], 2024.
The analysis of large volumes of data collected from heterogeneous sources is increasingly important for the development of megacities, the advancement of smart city technologies, and ensuring a high quality of life for citizens. This study aimed to develop algorithms for analyzing and interpreting social media data to assess citizens’ opinions in real time and ...
Added: February 22, 2026
Special Issue Sensing Technology for Smart Cities: Data, Analytics, and Visualizations
[б.и.], 2024.
Nowadays a huge portion of population lives in urban areas, and projections indicate that most cities are going to be confronted with a growing urban population in the next few years. This undoubtably poses new challenges that must be addressed by city councils and stakeholders to guarantee citizens’ high quality of life. Mobility, pollution, climate ...
Added: February 15, 2026
Microfoundations of the Cultural Modernization Theory
Musaev A. U., Vorchik A., / Series Social Science Research Network "Social Science Research Network". 2026.
This paper attempts to model the evolutionary theory of modernization and democratization. The model reflects the key provisions of R. Inglehart and C. Welzel's theory and provides a microfoundation for the adaptation of subjective values to the objective importances of the survival factors and the structure of the labour markets from the perspective of evolutionary ...
Added: February 10, 2026
ALGORITHMIZATION OF LAW ENFORCEMENT MANAGEMENT PROCESSES USING ARTIFICIAL INTELLIGENCE
Barchukov, V., Relacoes Internacionais no Mundo Atual 2024 Vol. 4 No. 46 P. 113–132
Objective: Despite the opportunities that are opening up due to the development of information support systems and artificial intelligence in law enforcement, unfortunately, the Russian Federation has not yet fully formed a scientifically based legal and organizational framework for their integrated and practical application in activities of law enforcement agencies. The article aims to assess ...
Added: January 20, 2026
Artificial Intelligence for Urban Planning and Building Smart Cities
Demekhina A., Milshina Y., , in: Artificial Intelligence Enabled Real Time Environmental Monitoring.: Springer, 2026. P. 253–281.
Added: January 13, 2026
Denomination, Religiosity and Anti-Immigrant Attitudes in Europe:Comparative Evidence from the European Social Survey
Dorkhanov I., Sokolov B., / Series OSF "SocArXiv". 2025.
This study investigates the relationship between individual religiosity and attitudes towards immigrants of different religious backgrounds in Europe. Using data from the 7th wave of the European Social Survey (2014-2015), we examine the influence of individual denomination and subjective religiosity level on hostility towards Muslim immigrants and the importance of immigrants’ Christian background. Our analysis, ...
Added: December 23, 2025
Classification Approach to Mapping Cultural Differences: An Illustration Using Survey Data from 60 Russian Regions
Nastina E., Sokolov B., / Series OSF "SocArXiv". 2025.
We argue that a classification-based approach to measuring cultural differences across countries or subnational regions is a promising complement, and sometimes an alternative, to the widely used dimensional method in cross-cultural research. The latter summarises cultural variation using continuous dimensions, for example, Hofstede’s famous individualism-collectivism dimension. However, this approach relies on strong parametric assumptions, which are ...
Added: December 23, 2025
Cross-Nationally, Non-Probability Web Surveys Demonstrate Poorer DemographicCoverage and Yield More Liberal Estimates of Public Opinion than F2F Surveys
Korsunava V., Sokolov B., / Series OSF "SocArXiv". 2025.
Non-probability web surveys offer several advantages over face-to-face (F2F) interviews—they are cheaper, faster, more accessible, and reduce interviewer effects and desirability bias. As such, they are increasingly popular in both academic and commercial research. However, they often yield demographically biased samples, raising concerns about the accuracy of the resulting public opinion estimates. Most studies on ...
Added: December 23, 2025
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit