• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Automatic Privacy Detection in Scanned Document Images Based on Deep Neural Networks
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.
May 15, 2026
Preserving Rationality in a Period of Turbulence
The HSE International Laboratory for Logic, Linguistics and Formal Philosophy studies logic and rationality in a transformed world characterised by a diversity of logical systems and rational agents. The laboratory supports and develops academic ties with Russian and international partners. The HSE News Service spoke with the head of the laboratory, Prof. Elena Dragalina-Chernaya, about its work.
May 15, 2026
‘All My Time Is Devoted to My Dissertation
Ilya Venediktov graduated from the Master’s programme at the HSE Tikhonov Moscow Institute of Electronics and Mathematics through the combined Master’s–PhD track and is currently studying at the HSE Doctoral School of Engineering Sciences. At present, he is undertaking a long-term research internship at the University of Science and Technology of China in Hefei, where he is preparing his dissertation. In this interview, he explains how an internship differs from an academic mobility programme, discusses his research topic, and describes the daily life of a Russian doctoral student in China.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Automatic Privacy Detection in Scanned Document Images Based on Deep Neural Networks

P. 1–6.
Kopeykina Lyudmila, Savchenko A.

The authors consider the problem of automatic detection of private scanned documents based on text recognition with deep neural networks. The paper suggests implementing a two-phase approach with the first stage which includes efficient EAST text detection and recognition using Tesseract OCR Engine. Secondly, the authors classify the privacy of a scanned document by deep neural networks applied to the extracted text. After that, a special dataset is gathered in order to train these networks. The experiments show that using OCR Engine for both text detection and segmentation ends up with relatively poor identification of private documents when compared to preliminary text detection with EAST method. Moreover, conventional keyword spotting using the list of sensitive words is less accurate when compared to neural network-based methods. Finally, it was demonstrated that the classification of a bag of most frequent words outperforms traditional text classification techniques with LSTM and convolutional networks.

Language: English
Full text
DOI
Text on another site
Keywords: распознавание текстаtext recognitiondeep neural networksглубокие нейронные сетиprivacy detectionдетектирование персональных данных
Publication based on the results of:
Эффективные методы распознавания мультимедийных данных для задач анализа предпочтений пользователей мобильных устройств (2019)

In book

2019 International Russian Automation Conference (RusAutoCon)
IEEE, 2019.
Similar publications
Ансамбль современных моделей компьютерного зрения для задачи обнаружения дипфейков
Pikul A. S., Безопасность информационных технологий 2024 Т. 31 № 4 С. 116–127
This article explores the potential use of modern computer vision architectures for the task of deepfake detection. The following architectures are considered: EfficientNet, Vision Transformer (ViT), VisionLSTM (ViL), Vision KAN, and Mamba Vision. The novelty of the approach lies in the application and comparison of these architectures, as well as their combination into paired ensembles ...
Added: December 12, 2025
The Appliance of Deep Neural Networks in the Process of Managing Chemical Enterprises
Kulyasova E. V., Kulyasov N.S., Puchkov A. Y., , in: Journal of Physics: Conference Series Volume 1260, 2019 Mechanical Science and Technology Update 23–24 April 2019, Omsk, Russian Federation.: IOP Publishing, 2019. Ch. 3 P. 032024–032024.
This article is introduced into the perspective tendencies of the digital transformation of chemical enterprises which allow to improve the process of managing enterprises of the branch. Presented the algorithms of managing and technological information processing based on deep neural network apparatus. New approaches to data processing known as video analytics are applied; it allows ...
Added: September 27, 2024
Latent Stochastic Differential Equations for Change Point Detection
Ryzhikov A., Hushchyn M., Derkach D., IEEE Access 2023 Vol. 11 P. 104700–104711
Automated analysis of complex systems based on multiple readouts remains a challenge. Change point detection algorithms are aimed to locating abrupt changes in the time series behaviour of a process. In this paper, we present a novel change point detection algorithm based on Latent Neural Stochastic Differential Equations (SDE). Our method learns a non-linear deep ...
Added: October 5, 2023
Data-Driven Short-Term Daily Operational Sea Ice Regional Forecasting
Grigoryev T., Verezemskaya P., Krinitskiy M. et al., Remote Sensing 2022 Vol. 14 No. 22 Article 5837
Global warming has made the Arctic increasingly available for marine operations and created a demand for reliable operational sea ice forecasts to increase safety. Because ocean-ice numerical models are highly computationally intensive, relatively lightweight ML-based methods may be more efficient for sea ice forecasting. Many studies have exploited different deep learning models alongside classical approaches ...
Added: June 19, 2023
Loss function dynamics and landscape for deep neural networks trained with quadratic loss
Nakhodnov M., Kodryan M., Lobacheva E. et al., , in: Doklady MathematicsVol. 106. Issue 1: Supplement.: Pleiades Publishing, Ltd. (Плеадес Паблишинг, Лтд), 2023. P. 43–62.
Knowledge of the loss landscape geometry makes it possible to successfully explain the behavior of neural networks, the dynamics of their training, and the relationship between resulting solutions and hyperparameters, such as the regularization method, neural network architecture, or learning rate schedule. In this paper, the dynamics of learning and the surface of the standard ...
Added: June 9, 2023
Использование сверточных нейронных сетей для реидентификации людей в городских условиях
Сучков Е. П., Алексеенко Г. О., Налчаджи К. В., Интеллектуальные системы. Теория и приложения 2022 Т. 26 № 1 С. 250–254
Currently, video surveillance systems are becoming more widespread. One of the main goals of such systems is to control and track a person’s movement. The solution of this problem allows us to solve such applied problems as tracking the occupancy of various premises (whether shopping facilities or educational and cultural institutions), creating a motion heatmap or organizing control of access to ...
Added: January 31, 2023
Использование сверточных нейронных сетей для реидентификации людей в городских условиях
Алексеенко Г., Налчаджи К., Интеллектуальные системы. Теория и приложения 2022 Т. 26 № 1 С. 250–254
В настоящее время все более широкое распространение получают различные системы видеофиксации. Одной из основных целей таких систем является контроль и слежение за человеком. Решение данной задачи позволяет в дальнейшем решать такие прикладные задачи, как контроль заполненности различных помещений (будь-то торговые объекты или образовательно-культрурные учереждения), построение тепловой карты перемещений человека, организация контроля доступа к тому или ...
Added: December 21, 2022
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes
Kodryan M., Lobacheva E., Nakhodnov M. et al., , in: Thirty-Sixth Conference on Neural Information Processing Systems : NeurIPS 2022.: Curran Associates, Inc., 2022. P. 14058–14070.
A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. However, the varying ELR ...
Added: December 20, 2022
Recognition of the Bare Soil Using Deep Machine Learning Methods to Create Maps of Arable Soil Degradation Based on the Analysis of Multi-Temporal Remote Sensing Data
Rukhovich D., Koroleva P., Rukhovich D. et al., Remote Sensing 2022 Vol. 14 No. 9 Article 2224
The detection of degraded soil distribution areas is an urgent task. It is difficult and very time consuming to solve this problem using ground methods. The modeling of degradation processes based on digital elevation models makes it possible to construct maps of potential degradation, which may differ from the actual spatial distribution of degradation. The ...
Added: November 14, 2022
Comment on “Pushing the frontiers of density functionals by solving the fractional electron problem”
Gerasimov I., Losev T., Evgeny Yu. Epifanov et al., Science 2022 Vol. 377 No. 6606 Article eabq3385
Kirkpatrick et al. (Reports, 9 December 2021, p. 1385) trained a neural network–based DFT functional, DM21, on fractional-charge (FC) and fractional-spin (FS) systems, and they claim that it has outstanding accuracy for chemical systems exhibiting strong correlation. Here, we show that the ability of DM21 to generalize the behavior of such systems does not follow ...
Added: September 25, 2022
Deep learning for inferring distribution of time to the last common ancestor from a diploid genome
K. Arzymatov, E. Khomutov, V. Shchur, Lobachevskii Journal of Mathematics 2022 Vol. 43 No. 8 P. 2092–2098
Genomic data is a rich source of information about population history. In particular, for actively recombining species the time to the last common ancestor (LCA) between two chromosomes might be different in different chromosome loci. Estimating local LCA time is important for many problems: it can be used to infer genes under selection, or to ...
Added: September 19, 2022
Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations
Belomestny D., Naumov A., Puchkin N. et al., Neural Networks 2023 Vol. 161 P. 242–253
This paper investigates the approximation properties of deep neural networks with piecewise-polynomial activation functions. We derive the required depth, width, and sparsity of a deep neural network to approximate any Hölder smooth function up to a given approximation error in Hölder norms in such a way that all weights of this neural network are bounded ...
Added: July 13, 2022
On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay
Lobacheva E., Kodryan M., Chirkova N. et al., , in: Advances in Neural Information Processing Systems 34 (NeurIPS 2021).: Curran Associates, Inc., 2021. P. 21545–21556.
Added: December 29, 2021
Distributed Deep Learning In Open Collaborations
Diskin M., Bukhtiyarov A., Ryabinin M. et al., , in: Advances in Neural Information Processing Systems 34 (NeurIPS 2021).: Curran Associates, Inc., 2021. P. 7879–7897.
Added: November 24, 2021
Gender domain adaptation for automatic speech recognition
Sokolov A., Savchenko A., , in: 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI).: IEEE, 2021. P. 413–418.
This paper is focused on the finetuning of acoustic models for speaker adaptation goals on a given gender. We pretrained the Transformer baseline model on Librispeech-960 and conducted experiments with finetuning on the gender-specific test subsets. The obtained word error rate (WER) relatively to the baseline is up to 5% and 3% lower on male ...
Added: September 26, 2021
On the generalization ability of data-driven models in the problem of total cloud cover retrieval
Krinitskiy M., Alexandrova M., Verezemskaya P. et al., Remote Sensing 2021 Vol. 13 No. 2 Article 326
Total Cloud Cover (TCC) retrieval from ground-based optical imagery is a problem that has been tackled by several generations of researchers. The number of human-designed algorithms for the estimation of TCC grows every year. However, there has been no considerable progress in terms of quality, mostly due to the lack of systematic approach to the ...
Added: September 24, 2021
Generating Sport Summaries: A Case Study for Russian
Malykh V., Porplenko D., Tutubalina E., , in: Analysis of Images, Social Networks and Texts: 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020, Revised Selected PapersVol. 12602.: Springer, 2021. P. 149–161.
We present a novel dataset of sports broadcasts with 8,781 games. The dataset contains 700 thousand comments and 93 thousand related news documents in Russian. We run an extensive series of experiments of modern extractive and abstractive approaches. The results demonstrate that BERT-based models show modest performance, reaching up to 0.26 ROUGE-1F-measure. In addition, human evaluation ...
Added: May 10, 2021
Black-Box Optimization with Local Generative Surrogates
Belavin V., Ustyuzhanin A., Sergey Shirobokov et al., , in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020).: Curran Associates, Inc., 2020. P. 14650–14662.
Added: February 14, 2021
On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era
Sokolov A., / Series Computer Science "arxiv.org". 2021.
Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the SER systems. Further, more clarification is required for analysing the impact of ASR's word error rate (WER) on linguistic emotion ...
Added: November 17, 2020
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit