• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Photo privacy detection based on text classification and face clustering
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 5, 2026
Neural Network Maps as a Method for Constructing Mathematical Models
Scientists from HSE University–Nizhny Novgorod and the Institute of Physics Belgrade, Serbia, are jointly exploring the application of machine learning techniques and neural networks to the study of nonlinear dynamics. Natalya Stankevich, Leading Research Fellow at the Laboratory of Topological Methods in Dynamics of the Faculty of Informatics, Mathematics, and Computer Science at HSE University–Nizhny Novgorod, spoke to the HSE News Service about this international project.
June 5, 2026
‘In the Age of Technology, It Is Interesting to Look into the Past and Think about What We Can Take from It
Polina Tabakova decided to apply for a Philology degree at HSE in Nizhny Novgorod because she grew up in Mari El and did not want to move far away from the Russian forests. In an interview for the Young Scientists of HSE University project, she spoke about the genre of the campus novel, the existential drama of Kolobok, and a blackout version of Eugene Onegin.
June 5, 2026
HSE Scientists Develop Method to Compress Large Language Models Without Losing Quality
Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a new compression method for large language models such as GPT and LLaMA that reduces their size by 25–36% without additional training or significant loss of accuracy. This is the first approach to use mathematical transformations—specifically, rotations of model weights—to make models more amenable to compression with structured matrices. The study results have been published in ACL Findings 2025. The code is available on GitHub.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Photo privacy detection based on text classification and face clustering

Ch. 39. P. 171–176.
Kopeykina L., Savchenko A.

Nowadays, the photo privacy detection is becoming an acute task due to a wide spread of mobile devices with photos published on social networks. As a photo might contain private or sensitive data, there is an urgent need to accurately determine them and impose restrictions on their processing. In this paper we focus on the task of personal data detection in a photo gallery. A novel two-stage approach is proposed. At first, text of scanned documents is processed based on an EAST text detector, and extracted text is recognized using Tesseract and neural network classifier. At the second stage, face clustering is implemented for the remaining photos to identify large groups of people (friends, relatives) whose photos also refer to personal data and must be processed directly on a mobile device. The remaining images can be sent to a remote server for processing with higher accuracy. The experimental results of text recognition and face clustering methods using various convolutional networks for facial features extraction are presented.

Language: English
Full text
Text on another site
Keywords: text classificationdata privacyFacial clusteringкластеризация лицклассификация текстаtext detectionдетектирование текста на изображениях
Publication based on the results of:
Эффективные методы распознавания мультимедийных данных для задач анализа предпочтений пользователей мобильных устройств (2019)

In book

Proceedings of the VI International conference Information Technology and Nanotechnology. Session Image Processing and Earth Remote Sensing (ITNT-IPERS)
Vol. 2665: Information Technology and Nanotechnology. Image Processing and Earth Remote Sensing 2020. , Samara: CEUR Workshop Proceedings, 2020.
Similar publications
Determinants of Сonsent to Personal Data Surveillance: Experimental Evidence from Russia
Sizov A., Rodionova M., Sedashov E. et al., / NRU Higher School of Economics. Series PS "Political Science". 2026. No. 1.
Rapid development of surveillance technologies is one of the most socially important consequences of the digital age. This paper investigates the factors determining consent to surveillance of various types  of  personal  data and contributes  to  rapidly  growing  research  on  citizens  perceptions  of surveillance practices. Relying on a comprehensive survey experiment, we study the effects of ...
Added: May 15, 2026
Дискриминативная лемматизация сокращений в эпоху LLM
Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155
This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...
Added: March 10, 2026
Transformer-based approaches for lemmatizing abbreviations in Russian texts
Glazkova A., Lyashevskaya O., Morozov D. et al., Journal of Mathematical Sciences 2025 Vol. 546 P. 32–47
This paper addresses the task of lemmatizing abbreviations in the Russian language. Abbreviation lemmatization is particularly challenging, as it involves not only transforming a word into its normal form but also correctly expanding the abbreviation. We explore two approaches to this task, both leveraging large pretrained language models. The first approach is generative, where the ...
Added: March 10, 2026
Кодекс этики в сфере искусственного интеллекта в медицине и здравоохранении
Абрамова А. В., Белоусова Е. Н., Ватюков С. Е. et al., Проблемы стандартизации в здравоохранении 2025 № 5-6 С. 3–14
The improvement of artificial intelligence (AI) technologies and their rapid integration into the socially and economically significant medical industry create broad prospects for ensuring accessibility and quality of medical care, while at the same time creating new challenges related to the safety and ethical risks of using innovative solutions. This creates the need to develop ...
Added: December 7, 2025
Stalactite: toolbox for fast prototyping of vertical federated learning systems
Zakharova A., Alexandrov D., Khodorchenko M. et al., , in: RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems.: Association for Computing Machinery (ACM), 2024. P. 1187–1190.
Machine learning (ML) models trained on datasets owned by different organizations and physically located in remote databases offer benefits in many real-world use cases. State regulations or business requirements often prevent data transfer to a central location, making it difficult to utilize standard machine learning algorithms. Federated Learning (FL) is a technique that enables models ...
Added: November 24, 2024
Индекс «этичности» систем искусственного интеллекта в медицине: от теории к практике
Ugleva A. V., Shilova V. A., Карпова Е. А., Этическая мысль 2024 Т. 24 № 1 С. 144–159
The article presents the methodology developed in the HSE University – Index of EthicsofArtificial Intelligence Systems. The task of developing this Index was to assess real andpossible ethical risks arising at all stages of the life cycle of AI systems. The system itselfdoes not possess any “ethics”, while socially acceptable, morally permissible, and necessarymay be ...
Added: July 15, 2024
Эмоциональный анализ постов в ВКонтакте: классификатор или регрессор
Kolmogorova A., Калинин А. А., В кн.: Компьютерная лингвистика и интеллектуальные технологии: по материалам международной конференции «Диалог 2022», выпуск 21Вып. 21.: Изд-во РГГУ, 2022. С. 311–322.
The article summarizes the results of two tasks in machine learning paradigm: the task of classification according to the criterion of dominating emotion on the data of social networks posts in Russian and the regression task using the same data. The experiments are conducted on the data set collected from VKontakte social network and consisted of 3879 posts ...
Added: March 18, 2024
Machine learning approach for scientific and technical expertise
A. V. Belov, E. A. Egorova, Bulletin D. Serikbayev East Kazakhstan Technical University 2023 No. 4 P. 92–102
When conducting scientific and technical expertise, it is necessary to analyze the texts of reports on scientific research work. The analysis is carried out in order to determine whether the research being conducted belongs to the class of scientific research and development work in the field of IT. This article discusses the tasks of binary ...
Added: March 9, 2024
Classification of Short Scientific Texts
I. K. Kusakin, Fedorets O. V., A. Y. Romanov, Scientific and Technical Information Processing 2023 Vol. 50 No. 3 P. 176–183
This paper discusses modern approaches to natural language processing and the application of machine learning models to the task of classifying short scientific texts in Russian. This study is devoted to the analysis of methods for vectorization of textual information, selection of a model for scientific paper clas- sification, and training of linguistic model BERT ...
Added: November 4, 2023
Secure Codes With Accessibility for Distributed Storage
Holzbaur L., Kruglik S., Frolov A. et al., IEEE Transactions on Information Forensics and Security 2021 Vol. 16 P. 5326–5337
A distributed storage system must support efficient access to stored data while ensuring recovery of temporally unavailable nodes. Another important aspect of a distributed storage system is security. In this paper, we bring these features together and investigate the problem of efficient access to stored data in presence of a passive eavesdropper with access to ...
Added: September 9, 2023
The Scope of the Personal Data Concept in Russia
Зюбанов К. А., Legal Issues in the Digital Age 2023 Vol. 4 No. 1 P. 53–76
Personal data as an institution is gaining increasing attention on the part of both public authorities, business structures and private individuals as subjects of personal data. Meanwhile, an efficient and successful usage of the tools provided by this institution directly depends on whether the scope of the personal data concept can be unambiguously defined. The paper describes ...
Added: May 8, 2023
Near-Zero-Shot Suggestion Mining with a Little Help from WordNet
Alekseev A., Tutubalina E., Kwon S. et al., , in: Analysis of Images, Social Networks and Texts. 10th International Conference, AIST 2021, Tbilisi, Georgia, December 16–18, 2021, Revised Selected Papers.: Cham: Springer, 2022. P. 23–36.
In this work we explore the constructive side of online reviews: advice, tips, requests, and suggestions that users provide about goods, venues and other items of interest. To reduce training costs and annotation efforts needed to build a classifier for a specific label set, we present and evaluate several entailment-based zero-shot approaches to suggestion classification ...
Added: April 10, 2023
Selection of Pseudo-Annotated Data for Adverse Drug Reaction Classification Across Drug Groups
Alimova I., Tutubalina E., , in: Analysis of Images, Social Networks and Texts. 10th International Conference, AIST 2021, Tbilisi, Georgia, December 16–18, 2021, Revised Selected Papers.: Cham: Springer, 2022. P. 37–44.
Automatic monitoring of adverse drug events (ADEs) or reactions (ADRs) is currently receiving significant attention from the biomedical community. In recent years, user-generated data on social media has become a valuable resource for this task. Neural models have achieved impressive performance on automatic text classification for ADR detection. Yet, training and evaluation of these methods ...
Added: April 10, 2023
Использование BERT для классификации коротких научных текстов на русском языке
Кусакин И. К., Цурупа А. М., Алмакаев А. В. et al., В кн.: НТИ-2022. Научная информация в современном мире: глобальные вызовы и национальные приоритеты : материалы 10-ой научной конференции с международным участием, посвященной 70-летию ВИНИТИ РАН, Москва, 25–26 октября 2022 года.: М.: ВИНИТИ РАН, 2022. С. 103–109.
This work is devoted to the study of approaches for training BERT-based classifiers of scientific articles to implement the application with the adoption of the best models for use in the infrastructure of the VINITI RAS. For this purpose, the BERT linguistic model was trained on a specialized corpus of scientific texts for subsequent use ...
Added: January 31, 2023
Исследование методов машинного обучения для классификации научных текстов на русском языке
Кусакин И. К., Федорец О. В., Romanov A., Научно-техническая информация. Серия 2: Информационные процессы и системы 2022 Т. 12 С. 6–9
This paper discusses modern approaches to natural language processing and appliance of artificial intelligence technologies in the task of classifying scientific texts in Russian. The report contains an analysis of implementations of text vectorization methods, a description of experiments with training various classifier models: from classical machine learning algorithms to neural network transformer architectures. ...
Added: January 31, 2023
Pulse of the Nation: Observable Subjective Well-Being in Russia Inferred from Social Network Odnoklassniki
Sergey Smetanin, Mathematics 2022 Vol. 10 No. 16 Article 2947
Policymakers and researchers worldwide are interested in measuring the subjective well-being (SWB) of populations. In recent years, new approaches to measuring SWB have begun to appear, using digital traces as the main source of information, and show potential to overcome the shortcomings of traditional survey-based methods. In this paper, we propose the formal model for ...
Added: August 15, 2022
Using a Homogeneous Semantic Network to Classify the Results of Genetic Analysis
Kharlamov A. A., Kulikov A., , in: Neuroinformatics and Semantic Representations: Theory and Applications.: Cambridge Scholars Publishing, 2020. P. 219–231.
В работе показано использование механизма сравнения семантических сетей текстов  в задаче диагностики заболеваний с использованием сигнальных сетей. Выявление степени пересечения семантических сетей текстов позволяет говорить о степени их смыслового подобия. Однородная семантическая сеть как множество узлов, связанных дугами, имеет численные характеристики – частоты появления слов, а также пар слов в тексте, которые перенормируются с использованием ...
Added: December 7, 2021
TextAnalyst Technology for Automatic Semantic Analysis of Text
Kharlamov A. A., , in: Neuroinformatics and Semantic Representations: Theory and Applications.: Cambridge Scholars Publishing, 2020. P. 156–167.
На основе представлений об обработке информации в мозге человека [1] реализована технология автоматической смысловой обработки текстов TextAnalyst, позволяющая выявить ключевые понятия текста в их взаимосвязях, реализовать реферирование текстов и их смысловое сравнение (классификацию). Реализованы продукты, использующие функциональность этой технологии: персональный – TextAnalyst, и библиотека COM модулей – TextAnalyst SDK. ...
Added: December 7, 2021
Share of Toxic Comments among Different Topics: The Case of Russian Social Networks
Smetanin S., Komarov M. M., , in: IEEE 23rd Conference on Business Informatics (CBI).: IEEE Computer Society, 2021. P. 65–70.
With the widespread use of online social networks, it is becoming more and more difficult to monitor and analyse all the user-generated content. Toxic speech in online conversations should be treated as a matter with serious social gravity, since it may result in both negative impacts on mental health and violent actions in the physical ...
Added: September 14, 2021
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit