• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • Автоматическое извлечение текстовых и числовых веб-данных для целей социальных наук
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
April 28, 2026
Scientists Develop Algorithm for Accurate Financial Time Series Forecasting
Researchers at the HSE Faculty of Computer Science benchmarked more than 200,000 model configurations for predicting financial asset prices and realised volatility, showing that performance can be improved by filtering out noise at specific frequencies in advance. This technique increased accuracy in 65% of cases. The authors also developed their own algorithm, which achieves accuracy comparable to that of the best models while requiring less computational power. The study has been published in Applied Soft Computing.
April 27, 2026
Fair Division: How Mathematics Helps to Divide the Indivisible
How can items be allocated among participants so that no one feels short-changed? Alexander Karpov, Assistant Professor at the Faculty of Economic Sciences, and his Singaporean colleague, Prof. Warut Suksompong, set out to find a mathematical answer to this question. In this interview, they discuss how a model of rational preferences is constructed, why one cannot rely on a simple sum of values, and where an algorithm that asks a minimal number of questions can be useful.
April 24, 2026
Electronics of the Future: Why Superconductors and Spintronics Work Together
It was once believed that superconductivity and magnetism avoided each other like the devil avoids holy water. However, modern nanostructures prove the opposite. A Russian theoretical physicist and Indian experimentalists have joined forces to create the electronics of the future—free from energy losses. Nataliya Pugach, Professor at the School of Electronic Engineering at HSE MIEM and Leading Research Fellow at the Quantum Nanoelectronics Laboratory, explains how a long-standing acquaintance in Cambridge grew into a mirror laboratory project with the Indian Institute of Technology Bombay (IIT Bombay), how superconducting spintronics works, and what surprises a researcher in India beyond the university campus.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Автоматическое извлечение текстовых и числовых веб-данных для целей социальных наук

Социология: методология, методы, математическое моделирование. 2020. № 50-51. С. 141–183.
Zhuchkova S., Rotmistrov A.

The paper is devoted to the procedures of automatic data extraction from web pages, i.e., web scraping of web data. We consider different types of web data such as digital traces and other numeric and text web data as well as its advantages (the speed of data collection and, as a consequence, the continuous coverage, efficiency, etc.) and limitations (the limited representativeness, difficulties in organizing storage of a large amount of data, deviation from the traditional procedure for setting up a study, etc.) in comparison with traditional methods of data collection. Various tools of web data extraction (API, requests, and selenium) are described to illustrate principles of handling static and dynamic web pages. The paper also gives an overview of the basic minimum of competencies for web scraping: in particular, programming using Python and navigating through the web pages’ code. A detailed illustration is given based on a fragment of the data collection process from a recent relevant Russian study.

Priority areas: sociology
Language: Russian
Full text
Text on another site
Keywords: большие данныеAPIbig dataтекстовые данныеcomputational social scienceweb scrapingAPIrequeststext dataвычислительные социальные наукиautomatic data extractionweb dataавтоматическое извлечение данныхвеб-данныевеб-скрапингseleniumrequestsselenium
Similar publications
Балканские войны 1912–1913 гг. в современных национальных СМИ Сербии как символ единения балканских народов
Мулина А. А., В кн.: Балканские войны 1912–1913 гг.: далекие предпосылки и долгое эхо.: М.: Институт славяноведения РАН, 2024. С. 287–297.
В данной статье рассматривается вопрос отражения событий 1912–1913 гг. в национальных СМИ Сербии в 2012–2013 и 2022–2023 гг. Опираясь на «большие данные», полученные из сервиса Google, а также на материалы качественной газеты «Политика», автор анализирует особенности освещения эпизодов Балканских войн, а также запросы пользователей интернета на территории Сербии по темам, связанным с событиями 1912–1913 гг. ...
Added: April 21, 2026
Президентские выборы в Турецкой Республике в информационном пространстве стран Балканского полуострова: медиагеографический анализ
Мулина А. А., Якова Т. С., Вестник Российского университета дружбы народов. Серия: Литературоведение, журналистика 2025 Т. 30 № 1 С. 161–171
The article presents the results of a study of the information space of the Balkan states conducted during the presidential elections in Turkey (2023): the authors referred to this period as one of the most striking political events in the country over the past five years. The purpose of the proposed work is to identify ...
Added: April 21, 2026
Big Data как актив: задачи правового обеспечения оборота данных средствами публичного права
Лескина Э. И., Законодательство 2026 № 2 С. 22–29
One of the hallmarks of big data is its value, which stems from the essence of the current stage of societal development and the importance of information and data. However, without legal support for the economic nature of data, realizing its inherent potential becomes impossible. Currently, the existing legal framework for utilizing this asset in ...
Added: April 13, 2026
РАЗВИТИЕ НАЛОГОВОГО АДМИНИСТРИРОВАНИЯ В РОССИИ В УСЛОВИЯХ ПРИМЕНЕНИЯ ТЕХНОЛОГИИ БОЛЬШИХ ДАННЫХ
Lyutova O. I., Горбунова М. А., Вопросы государственного и муниципального управления 2026 № 1 С. 35–57
Использование больших данных в налоговом администрировании переходит от внедрения отдельных цифровых технологий к этапу качественной аналитики с использованием алгоритмов автоматического анализа значительных по объему массивов информации из различных источников, что порождает ряд системных вызовов. Задача исследования – выяснить и проанализировать состояние трансформации налогового администрирования, осуществляемой посредством внедрения и использования цифровых инструментов, главным образом – технологий ...
Added: April 7, 2026
Институт аналогии в информационном праве России
Лескина Э. И., Вестник Воронежского государственного университета. Серия: Право 2025 № 4(63) С. 157–165
The speed of development and distribution of digital technologies increases every year, as do the areas of their application. Artificial intelligence systems solve creative problems, predictive analytics are used in law enforcement agencies, and such areas as healthcare, transport, education and many other areas do not go unnoticed. At the same time, part of public ...
Added: April 1, 2026
Политические эффекты государственных цифровых платформ и сервисов в автократиях
Balayan A. A., Томин Л. В., Публичная политика 2023 Т. 7 № 1-2 С. 108–117
The paper is devoted to the study of certain aspects of the digitalization of public administration in autocracies, primarily government platforms and digital services. The analysis of the political effects of government platforms and services is carried out in the broader context of the study of new cybernetic mode of governance that complement/transform the disciplinary ...
Added: March 31, 2026
Историческая политика в межпартийной борьбе в современной Индии: трансформация образа Индиры Ганди в 2014-2019 гг.
Анташева Мария Сергеевна, Вестник Российского университета дружбы народов. Серия: Всеобщая история 2026 Т. 18 № 1 С. 44–58
The perception of key leaders in Indian modern history and its transformation within the contemporary political landscape constitutes a significant element of the political struggle between the country’s two major parties — the Indian National Congress (INC) and the Bharatiya Janata Party (BJP). Since coming to power, the BJP has consistently pursued a strategy of ...
Added: March 19, 2026
Цифровое общество: теоретическая модель и российская действительность
Smirnov A., Мониторинг общественного мнения: Экономические и социальные перемены 2021 № 1 С. 129–153
The article considers a theoretical model of digital society based on four concepts: super-connectivity, platformisation, datafication, and algorithmic governance. The model describes how the digitalisation of society deepens: from the transfer of individual practices and social interactions to a new social order based on big data. Analysis of panel data from the 2003–2018 longitudinal survey ...
Added: March 18, 2026
Прогнозирование миграционных процессов методами цифровой демографии
Smirnov A., Экономика региона 2022 Т. 18 № 1 С. 133–145
The nature and intensity of migration processes are constantly changing. Demographic statistics are not suitable for obtaining up-to-date information and making timely decisions in the field of demographic and social policy. Thus, digital demography is becoming increasingly important, as this area of population research uses new methods and data sources resulting from the Internet expansion ...
Added: March 18, 2026
Загадка внутренней мотивации
Vorchik A., / Social Science Research Network. Серия SSRN Working Paper Series "SSRN Working Paper Series". 2026.
This article is devoted to the phenomenon of intrinsic motivation, to understand which two models are proposed. We study how positive/negative intrinsic motivation to work (experienced utility) affects worker's individual labour supply (model I) and the amount of effort they exert (model II). In model I, we use intrinsic motivation to explain the positive/negative slope ...
Added: March 15, 2026
Improving guest satisfaction by identifying hotel service micro-elements failures through Deep Learning of online reviews
Kazakov S., Cuesta-Valiño P., Butkovskaya V. et al., Cuadernos de Gestion 2025 Vol. 25 No. 1 P. 71–88
This study provides an in-depth examination of often-overlooked hotel service micro-elements within the broader spectrum of hospitality services, with the aim of improving service delivery and enhancing guest satisfaction. To achieve this, we develop a methodological framework that integrates: (a) VADER text-based sentiment analysis, (b) a robust logistic regression procedure to identify the specific hotel ...
Added: February 28, 2026
Data Analytics for Predicting Situational Developments in Smart Cities: Assessing User Perceptions
Kharlamov A. A., Pilgun M., , in: Special Issue Sensing Technology for Smart Cities: Data, Analytics, and VisualizationsVol. 24. Issue 15.: [б.и.], 2024.
The analysis of large volumes of data collected from heterogeneous sources is increasingly important for the development of megacities, the advancement of smart city technologies, and ensuring a high quality of life for citizens. This study aimed to develop algorithms for analyzing and interpreting social media data to assess citizens’ opinions in real time and ...
Added: February 22, 2026
Special Issue Sensing Technology for Smart Cities: Data, Analytics, and Visualizations
[б.и.], 2024.
Nowadays a huge portion of population lives in urban areas, and projections indicate that most cities are going to be confronted with a growing urban population in the next few years. This undoubtably poses new challenges that must be addressed by city councils and stakeholders to guarantee citizens’ high quality of life. Mobility, pollution, climate ...
Added: February 15, 2026
Microfoundations of the Cultural Modernization Theory
Musaev A. U., Vorchik A., / Series Social Science Research Network "Social Science Research Network". 2026.
This paper attempts to model the evolutionary theory of modernization and democratization. The model reflects the key provisions of R. Inglehart and C. Welzel's theory and provides a microfoundation for the adaptation of subjective values to the objective importances of the survival factors and the structure of the labour markets from the perspective of evolutionary ...
Added: February 10, 2026
ALGORITHMIZATION OF LAW ENFORCEMENT MANAGEMENT PROCESSES USING ARTIFICIAL INTELLIGENCE
Barchukov, V., Relacoes Internacionais no Mundo Atual 2024 Vol. 4 No. 46 P. 113–132
Objective: Despite the opportunities that are opening up due to the development of information support systems and artificial intelligence in law enforcement, unfortunately, the Russian Federation has not yet fully formed a scientifically based legal and organizational framework for their integrated and practical application in activities of law enforcement agencies. The article aims to assess ...
Added: January 20, 2026
Artificial Intelligence for Urban Planning and Building Smart Cities
Demekhina A., Milshina Y., , in: Artificial Intelligence Enabled Real Time Environmental Monitoring.: Springer, 2026. P. 253–281.
Added: January 13, 2026
Denomination, Religiosity and Anti-Immigrant Attitudes in Europe:Comparative Evidence from the European Social Survey
Dorkhanov I., Sokolov B., / Series OSF "SocArXiv". 2025.
This study investigates the relationship between individual religiosity and attitudes towards immigrants of different religious backgrounds in Europe. Using data from the 7th wave of the European Social Survey (2014-2015), we examine the influence of individual denomination and subjective religiosity level on hostility towards Muslim immigrants and the importance of immigrants’ Christian background. Our analysis, ...
Added: December 23, 2025
Classification Approach to Mapping Cultural Differences: An Illustration Using Survey Data from 60 Russian Regions
Nastina E., Sokolov B., / Series OSF "SocArXiv". 2025.
We argue that a classification-based approach to measuring cultural differences across countries or subnational regions is a promising complement, and sometimes an alternative, to the widely used dimensional method in cross-cultural research. The latter summarises cultural variation using continuous dimensions, for example, Hofstede’s famous individualism-collectivism dimension. However, this approach relies on strong parametric assumptions, which are ...
Added: December 23, 2025
Cross-Nationally, Non-Probability Web Surveys Demonstrate Poorer DemographicCoverage and Yield More Liberal Estimates of Public Opinion than F2F Surveys
Korsunava V., Sokolov B., / Series OSF "SocArXiv". 2025.
Non-probability web surveys offer several advantages over face-to-face (F2F) interviews—they are cheaper, faster, more accessible, and reduce interviewer effects and desirability bias. As such, they are increasingly popular in both academic and commercial research. However, they often yield demographically biased samples, raising concerns about the accuracy of the resulting public opinion estimates. Most studies on ...
Added: December 23, 2025
Перспективы интеграции новых цифровых технологий в современное образование для повышения его эффективности
Бояров Е. Н., Социальная компетентность 2025 Т. 10 № 2 С. 42–51
The article addresses the problem of integrating new digital technologies into modern education to enhance its effectiveness and quality. The purpose of the study is to summarize theoretical and practical approaches to the use of digital tools in educational environments and to identify key directions and barriers to the digital transformation of education. The research ...
Added: December 9, 2025
Правовое регулирование индустрии аннотации данных как способ обеспечения качества данных
Лескина Э. И., Вестник Воронежского государственного университета. Серия: Право 2025 № 3 С. 64–71
The national project "Data Economy" aimed at digital transformation of various spheres is the next step to ensure scientific and technological sovereignty in the Russian Federation, while the key point for the implementation of numerous areas, federal projects, and activities within the national project is to improve the quality of both data and their sets, ...
Added: December 7, 2025
Extended Family Structures Exert a Causal Influence on Fertility
Ustyuzhanin V., Zinkina J. V., Korotayev A., / Series Soc " SocArXiv". 2025.
Research of the impact of kin alloparenting on reproduction mainly focuses on mother’s parents and parents-in-law. The impact on fertility of extended families and the alloparental help with childcare they can provide has received much less attention. Moreover, there is an important lacuna in the existing studies of this problem, as it is mostly approached ...
Added: November 11, 2025
Возможно ли измерить качество количеством?
В.В. Чистяков, / Высшая школа экономики. Серия WP7 "Математические методы анализа решений в экономике, бизнесе и политике". 2022. № 9.
Для понимания какого-либо качества, как правило, выбирается, некоторая подходящая для его оценки шкала (скажем, от 1 до 5), и, согласно этой шкале, качеству присваиваются одна или несколько оценок.  Эти оценки затем преобразуются в одну, “усредненную”, оценку; например, выбирается среднее арифметическое. Так, несданная сессия с тремя баллами 5, 2 и 5 “характеризуется’’ средней оценкой хорошо: (5+2+5)/3 = ...
Added: November 4, 2025
Мир стоит на пороге эпохи технологической сингулярности. Как изменятся тренды базовых глобальных процессов и эволюция человечества
Akaev A., Ильин И. В., Korotayev A., Вестник Российской академии наук 2025 № 9 С. 3–15
The article examines the likelihood of creating artificial intelligence (AI) at the human level (“human intelligence level”, AGI) by 2027-2029 and the onset of the era of technological singularity, when a fundamental change in the mechanism of human evolution will occur. It is noted that this probability is close to one, since these dates surprisingly ...
Added: October 28, 2025
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit