Методы построения социо-демографических профилей пользователей сети Интернет

?

Методы построения социо-демографических профилей пользователей сети Интернет

Труды Института системного программирования РАН. 2015. Т. 27. № 4. С. 129–144.

С.Д. Кузнецов, Гомзин А. Г.

he paper is devoted to methods for construction of socio-demographic profile of Internet users. Gender, age, political and religion views, region, relationship status are examples of demographic attributes. This work is a survey of methods that detect demographic attributes from user’s profile and messages. The most of surveyed works are devoted to gender detection. Age, political views and region are also interested researches.
The most popular data sources for demographic attributes extraction are social networks, such as Facebook, Twitter, Youtube.
The most of solutions are based on supervised machine learning. Machine learning allows to find target values (demographic attributes) dependencies from input data and use them to predict the value of the target attribute for the new data. The following problem solving steps are surveyed in the paper: feature extraction, feature selection, model training, evaluation.
Researches use different kind of data to predict demographic attributes. The most popular data source is text. Words sequences (n-grams), parts of speech, emoticons, features specific to particular resources (eg, @ mentions and # Hashtags on Twitter) are extracted and used as input for machine learning algorithms. Social graphs are also used as source data. Communities of users that are automatically extracted from social graph are user as features for attributes prediction.
Text data produces a lot of features. Feature selection algorithms are needed to reduce feature space.
The paper surveys feature selection, classification and regression algorithms, evaluation metrics.

Research target: Computer Science

Priority areas: IT and mathematics

Language: Russian

DOI

Text on another site

Обобщенная множественная структура информированности

Fedyanin D., Чхартишвили А. Г., Управление большими системами, Россия 2024 № 109 С. 6–20

Предложена новая модель описания информированности агентов, обобщающая известные модели рефлексии в следующем смысле: предлагаемая модель обобщенных множественных структур информированности (ОМСИ) допускает различное количество участников в представлениях агентов о ситуации. С помощью ОМСИ как обобщения множественных структур информированности (обобщающих в свою очередь точечные структуры информированности) можно описывать ситуации, когда агенты не знают точного состояния системы и ...

Added: March 10, 2026

Eyetracking and vegetatics data in a cognitive load task

Alshanskaia E., Martynova O., Portnova G. et al., Mendeley 2024

Eyetracking and vegetatics data in a cognitive load task Published: 10 July 2024| Version 1 Description Raw eyetracking and vegetatics data in a cognitive load task with additional false feedback in the second block. ...

Added: March 6, 2026

Online Neural Networks for Change-Point Detection

Hushchyn M., Arzymatov K., Derkach D., Machine Learning 2026 Vol. 115 Article 56

Moments when a time series changes its behavior are called change points. Occurrence of change point implies that the state of the system is altered and its timely detection might help to prevent unwanted consequences. In this paper, we present two change-point detection approaches based on neural networks and online learning. These algorithms demonstrate linear ...

Added: March 6, 2026

Отслеживание изменений уровня бодрствования с помощью рассчитываемых по спектральной мощности ЭЭГ-индексов

Бобров П. Д., Журнал высшей нервной деятельности им. И.П. Павлова 2025 Т. 75 № 6 С. 756–769

There is a need for empirical indicators that can monitor subtle changes in wakefulness levels with high temporal resolution. We aimed to assess the applicability in this regard of several indices based on the average spectral power of EEG rhythms, as well as the BIS index used in anesthesiology. 26 volunteers participated in an experiment involving forced awakenings ...

Added: March 6, 2026

Имитационное моделирование исполнимых BPMN-схем

Deryabin A. I., [б.и.], 2026.

The textbook is intended for undergraduate and graduate students of the Business Informatics and Software Engineering educational programs studying the disciplines of Enterprise Architecture, Analysis and Improvement of Business Processes and Improvement of Enterprise Architecture. The basics of designing, developing and optimizing executable business process models implemented in the BPMN-2.0 language are considered. The methodology ...

Added: March 5, 2026

Deep-learning-based Identification of Solar Magnetic Tornadoes and Their Spatial Properties during Solar Minimum and Maximum

Blumenau M., Khabarova O., Nikitin I. et al., Astrophysical Journal 2026 Vol. 999 No. 2 P. 171–187

Solar magnetic tornadoes are dynamic, spiral-shaped plasma structures characterized by helical magnetic fields and rotating plasma flows in the solar atmosphere. They play a significant role in the transport of energy and mass within the solar environment. Identifying and analyzing solar magnetic tornadoes is challenging due to their transient nature and complex morphology and the ...

Added: March 4, 2026

Информатика и прикладная математика: Материалы IX Международной научно-пракической конференции (31.10 - 1.11.2024 г.)

Алматы: Институт информационных и вычислительных технологий КН МНВО РК, 2024.

The collection contains papers presented by scientists from the Republic of Kazakhstan, the Russian Federation, Latvia, Poland, the Republic of Belarus, Japan, Iran, Malaysia, the Kyrgyz Republic, the Republic of Uzbekistan, and others. It addresses current issues in mathematics, computer science, and management, including mathematical modeling of complex systems and business processes, research and development ...

Added: March 3, 2026

Кластеризация паттернов потребления электроэнергии умного дома на основе ансамблевых методов машинного обучения

Maltseva S. V., Бериков В. Б., Кладов Д. Е. et al., В кн.: Информатика и прикладная математика: Материалы X Международной научно-практической конференции (08.10 - 11.10.2025 г.)Т. 1: Сборник материалов часть 1.: Алматы: Институт информационных и вычислительных технологий КН МНВО РК, 2025. С. 227–232.

This paper examines the problem of clustering consumption patterns for a private household. An ensemble algorithm based on the Wasserstein metric was developed and applied to cluster daily load profiles. The proposed approach allows for identifying typical energy consumption scenarios and interpreting consumer behavior. Results from computational experiments using real data are presented. ...

Added: March 3, 2026

Информатика и прикладная математика: Материалы X Международной научно-практической конференции (08.10 - 11.10.2025 г.)

Алматы: Институт информационных и вычислительных технологий КН МНВО РК, 2025.

The collection contains papers presented by scientists from the Republic of Kazakhstan, the Russian Federation, Turkey, Poland, the Republic of Belarus, Japan, Iran, Malaysia, the Kyrgyz Republic, the Republic of Uzbekistan, and other countries. It addresses current issues in mathematics, computer science, and management, including mathematical modeling of complex systems and business processes, research and ...

Added: March 3, 2026

Факторы мобильности на рынке труда в современной России: имеют ли значение социальные связи?

Халиков К., Экономическая социология 2026 Т. 27 № 1 С. 43–78

The aim of this study is to assess the impact of various factors, including social networks, on labor market mobility in modern Russia. The main assumption is that weak ties facilitate taking a better job. The empirical base consists of data from the Russian Longitudinal Monitoring Survey (RLMS) of HSE for 2016–2017 and 2018–2019. Based on exploratory factor analysis, ...

Added: March 1, 2026

The Exact Circuit Complexity of Boolean Functions in an Infinite Basis

Mikhailovich A., Kochergin V., Mathematical notes 2025 Vol. 117 No. 3-4 P. 579–594

The exact value of the complexity of the circuit implementation of an arbitrary Boolean function in a certain basis consisting of negation and all monotone Boolean functions is found. The complexity of a function is defined as the least number of basis elements sufficient to construct a circuit implementation of this function. ...

Added: February 28, 2026

SSL-MEPR: A Semi-Supervised Multi-Task Cross-Domain Learning Framework for Multimodal Emotion and Personality Recognition

Ryumina E., Aksenov A., Koryakovskaya D. et al., Machine Learning and Knowledge Extraction 2026 No. 8 P. 1–41

The growing demand for personalized human-computer interaction calls for methods that jointly model emotional states and personality traits. However, large-scale multimodal corpora annotated for both tasks are still lacking. This challenge stems from integrating diverse, task-specific corpora with divergent modality informativeness and domain characteristics. To address it, we propose SSL-MEPR, a semi-supervised multi-task cross-domain learning ...

Added: February 27, 2026

Моделирование информационного сетевого взаимодействия в киберсоциальных системах

Maltseva S. V., Голубцов П. В., Барахнин В. Б., Вычислительные технологии 2026 Т. 31 № 1 С. 5–22

The issues of macro-level monitoring of the manufacturing system in the implementation of the concepts of Industry 4.0 and 5.0 based on the study of information flows in manufacturing network structures are considered. The numerical models of three types of network interaction, that taking into account the influence of the number of objects, external influences, ...

Added: February 26, 2026

HoTPP benchmark: Are we good at the long horizon events forecasting?

Karpukhin I., Shipilov F., Savchenko A., Neurocomputing 2026 Vol. 672 Article 132771

Forecasting multiple future events within a given time horizon is essential for applications in finance, retail, social networks, and healthcare. This problem is typically addressed using Marked Temporal Point Processes (MTPP), which provide a principled framework for modeling both event timing and event labels. While most existing research focuses on predicting only the next event, forecasting distant future ...

Added: February 25, 2026

Comparative analysis of the characteristics of promising apsk modulation schemes in wireless telecommunications

Kazakov G. N., Nguyen H. T., Shevgunov T. et al., T-Comm: Telecommunications and transport 2025 Vol. 19 No. 9 P. 59–76

The growing requirements for the use of high-speed and energy-efficient high-capacity data transmission channels in modern and future telecommunication networks have led to an increasing interest in the formation and application of signals with new constellations. Requirements for the shape of signal constellations in connection with the emergence of new technologies of wireless telecommunications are ...

Added: February 25, 2026

Метод оценки частно-временной плотности вероятности цифрового сигнала с использованием линейной интерполяции

Shevgunov T., T-Comm: Телекоммуникации и транспорт 2024 Т. 18 № 7 С. 4–12

В работе представлена разработка нового инструмента частно-временного (fraction-of-time) подхода, в рамках которого случайный процесс описывается с использованием функциональных моделей, синтезируемых по его единственной наблюдаемой реализации, без необходимости построения абстрактных вероятностных моделей в условиях отсутствия достоверной априорной информации о проявлении процессом свойства эргодичности. На основе полученной ранее аналитической формулы, выражающей частно-временную плотность непрерывного сигнала в явной ...

Added: February 25, 2026

Proceedings of the Ninth International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’25), Volume 1. (LNNS, volume 1762)

Cham: Springer Publishing Company, 2025.

This book contains the works connected with the key advances in Intelligent Information Technologies for Industry presented at IITI 2025, the Ninth International Scientific Conference on Intelligent Information Technologies for Industry held on November 5-7, 2025 in Sirius Federal Territory, Russia. The book is written by the experts in the field of applied artificial intelligence ...

Added: February 25, 2026

Measuring External Conflict in Dempster-Shafer Theory Based on Kantorovich Problems

Bronevich A., Lepskiy A., International Journal of Approximate Reasoning 2026 Vol. 190 Article 109597

In the paper, we consider three possible types of external conflict in Dempster-Shafer theory and propose its measurement based on functionals evaluating intersection, inclusion and distance between random sets. All proposed functionals can be viewed as extensions of known functionals like Jaccard metric, Jaccard index, and Dice coefficient from usual sets to random sets based ...

Added: February 25, 2026

2022 IEEE International Conference on Data Mining (ICDM)

Sergei O. Kuznetsov, Buzmakov A., Makhalova T. et al., IEEE, 2022.

In this paper, we revisit pattern mining and study the distribution underlying a binary dataset thanks to the closure structure which is based on passkeys, i.e., minimum generators in equivalence classes robust to noise. We introduce △-closedness, a generalization of the closure operator, where △ measures how a closed set differs from its upper neighbors ...

Added: February 25, 2026

Определение фолликулярного резерва яичников по данным ультразвукового исследования на основе методов машинного обучения

Moshkin A., Лапутин Ф. А., Сидоров И. В., DIGITAL DIAGNOSTICS 2024 Т. 5 № S1 С. 40–42

BACKGROUND: Ovarian reserve reflects a woman's ability to successfully realize reproductive function. The assessment of ovarian reserve is an urgent task for clinical practice [1] and is important in scientific research. The use of computerized diagnostic image processing methods can accelerate and facilitate the performance of routine tasks in clinical practice. Their use in retrospective ...

Added: February 21, 2026

Онлайн-дискурс о демографической политике Китая: методологические аспекты анализа постов в социальной сети Weibo

Bocharova A., Зуенко И. Ю., Денисов И. Е., Вестник Санкт-Петербургского университета. Востоковедение и африканистика 2025 Т. 17 № 2 С. 366–377

The article focuses on analyzing the perceptions of recent changes in China’s demographic pol-icy by the contemporary Chinese society. These changes involved the relaxation of restrictions on the number of children in a family, first to two children in 2015 and subsequently to three children in 2021. The relevance of this research stems from the ...

Added: February 19, 2026

Предсказание риска развития церебрального инсульта

Кузнецов В. А., Yasnitsky L., В кн.: Искусственный интеллект в решении актуальных социальных и экономических проблем ХХI века : Сборник статей по материалам Десятой всероссийской научно-практической конференции с международным участием (г. Пермь, ПГНИУ, 9–10 октября 2025 г.).: Пермский государственный национальный исследовательский университет, 2025. С. 240–247.

В работе представлены разработка и сравнительный анализ методов машинного обучения для задачи бинарной классификации пациентов с риском развития церебрального инсульта. Исследовательский процесс включал этап тщательного разведочного анализа данных, за которым последовала реализация и оценка трех моделей: дерева решений, случайного леса и нейронной сети. Целью работы является определение наиболее эффективного алгоритма для построения системы поддержки врачебных решений, способной своевременно ...

Added: February 15, 2026

Проблема рационализации и чрезмерного полагания на инструменты XAI: анализ объяснений больших языковых моделей

Suvorova A., В кн.: XXII национальная конференция по искусственному интеллекту с международным участием (КИИ-2025)Т. 1.: СПб.: Санкт-Петербургский Федеральный исследовательский центр РАН, 2025. С. 310–318.

В работе исследуется проблема чрезмерного полагания (overreliance) пользователей на результаты интерпретации моделей машинного обучения, а также способов ее решения с помощью пояснений, генерируемых большими языковыми моделями (LLM). Результаты эксперимента показали, что большинство моделей, так же как и пользователи-люди в исходном эксперименте, игнорировали аномалии или предлагали правдоподобные, но ложные объяснения, рационализируя выводы. Это указывает на риски ...

Added: February 15, 2026

Как прогнозировать дефолты банков: эволюция методов, моделей и факторов риска

Shchepeleva M., Столбов М. И., Экономика и математические методы 2026 Т. 62 № 1 С. 63–77

Predicting bank defaults is an important task for the entire economy. Early identification of troubled banks helps to prevent impending bank failures or minimize the losses associated with them. The paper discusses the state of the art of instrumental methods and data used for this purpose. The theoretical background, the evolution of methodological approaches used ...

Added: February 13, 2026