• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Text Detoxification using Large Pre-trained Neural Models
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.
May 15, 2026
Preserving Rationality in a Period of Turbulence
The HSE International Laboratory for Logic, Linguistics and Formal Philosophy studies logic and rationality in a transformed world characterised by a diversity of logical systems and rational agents. The laboratory supports and develops academic ties with Russian and international partners. The HSE News Service spoke with the head of the laboratory, Prof. Elena Dragalina-Chernaya, about its work.
May 15, 2026
‘All My Time Is Devoted to My Dissertation
Ilya Venediktov graduated from the Master’s programme at the HSE Tikhonov Moscow Institute of Electronics and Mathematics through the combined Master’s–PhD track and is currently studying at the HSE Doctoral School of Engineering Sciences. At present, he is undertaking a long-term research internship at the University of Science and Technology of China in Hefei, where he is preparing his dissertation. In this interview, he explains how an internship differs from an academic mobility programme, discusses his research topic, and describes the daily life of a Russian doctoral student in China.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Text Detoxification using Large Pre-trained Neural Models

Ch. 629. P. 7979–7996.
Dale D., Voronov A., Dementieva D., Logacheva V. K., Kozlova O., Semenov N., Panchenko A.

We present two novel unsupervised methods for eliminating toxicity in text. Our first method combines two recent ideas: (1) guidance of the generation process with small style-conditional language models and (2) use of paraphrasing models to perform style transfer. We use a well-performing paraphraser guided by style-trained language models to keep the text content and remove toxicity. Our second method uses BERT to replace toxic words with their non-offensive synonyms. We make the method more flexible by enabling BERT to replace mask tokens with a variable number of words. Finally, we present the first large-scale comparative study of style transfer models on the task of toxicity removal. We compare our models with a number of methods for style transfer. The models are evaluated in a reference-free way using a combination of unsupervised style transfer metrics. Both methods we suggest yield new SOTA results.

Language: English
Keywords: Natural Language Processing (NLP)

In book

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Association for Computational Linguistics, 2021.
Similar publications
A textual fingerprint learning model to detect fake information spreaders in social networks
Behzadidoost R., Neurocomputing 2025 Vol. 665 P. 1–21
While earlier research has focused on detecting misinformation content, identifying the users who spread it, referred to in this paper as fake information spreaders, remains a relatively new challenge. These users deliberately mix true and false information, making detection more difficult. This paper proposes a textual fingerprint learning model to detect fake information spreaders. The ...
Added: March 12, 2026
SynEL: A synthetic benchmark for entity linking
Karpov I., Kirillovich A., Goncharova E. et al., Plos One 2026 Vol. 21 No. 1 Article e0339468
Large language models (LLMs) offer significant potential for constructing commonsense knowledge graphs from text, demonstrating adaptability across diverse domains. However, their effectiveness varies significantly with domain-specific language, highlighting a critical need for specialized benchmarks to assess and optimize knowledge graph construction sub-tasks like named entity recognition, relation extraction, and entity linking. Currently, domain-specific benchmarks are ...
Added: January 15, 2026
Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence
Washington, United States of America: AAAI Press, 2025.
AAAI-25 Technical Tracks 23 (Natural Language Processing II) collects peer-reviewed research papers that advance the state of natural language processing, with an emphasis on large language models, efficient inference, instruction following, retrieval augmentation, and multimodal language understanding. The papers address both theoretical and practical challenges, including model efficiency, interactive generation, grounding in external knowledge and ...
Added: December 18, 2025
Lacuna Inc. at SemEval-2025 Task 4: LoRA-Enhanced Influence-Based Unlearning for LLMs
Kudelya A., Shirnin A., , in: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025).: Association for Computational Linguistics, 2025. P. 1528–1533.
This paper describes LIBU (LoRA enhanced influence-based unlearning), an algorithm to solve the task of unlearning - removing specific knowledge from a large language model without retraining from scratch and compromising its overall utility (SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models). The algorithm combines classical influence functions to remove the influence of ...
Added: November 17, 2025
Empaths at SemEval-2025 Task 11: Retrieval-Augmented Approach to Perceived Emotions Prediction
Morozov L., Mogilevskii A., Shirnin A., , in: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025).: Association for Computational Linguistics, 2025. P. 2000–2007.
This paper describes LIBU (LoRA enhanced influence-based unlearning), an algorithm to solve the task of unlearning - removing specific knowledge from a large language model without retraining from scratch and compromising its overall utility (SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models). The algorithm combines classical influence functions to remove the influence of ...
Added: November 17, 2025
Findings of the Association for Computational Linguistics: EMNLP 2025
Association for Computational Linguistics, 2025.
The book contains this year’s edition of the Conference on Empirical Methods in Natural Language Processing! Importantly, it marks the 30th edition of EMNLP. With over 8,000 submissions, more than 3,000 accepted papers, and thousands of attendees, we have come a long way from that first workshop, which had 14 accepted papers. As the field ...
Added: November 16, 2025
Findings of the Association for Computational Linguistics: NAACL 2025
Association for Computational Linguistics, 2025.
Added: November 6, 2025
Explainable Document Classification via Concept Whitening and Stable Graph Patterns
Parakal E. G., Kuznetsov S., Makarov I. et al., IEEE Access 2025 Vol. 13 P. 149657–149678
This paper proposes a novel explainable document classification framework that integrates Concept Whitening (CW) with graph concepts that are derived from stable graph patterns, and extracted via methods based on Formal Concept Analysis (FCA) and pattern structures. Document graphs are constructed using Abstract Meaning Representation (AMR) graphs, from which graph concepts are extracted and aligned ...
Added: October 22, 2025
Building a Clean Bartangi Language Corpus and Training Word Embeddings for Low-Resource Language Modeling
Shumen: INCOMA Ltd, 2025.
This paper introduces a rule-based lemmatization and word embedding pipeline for the endangered Bartangi language, part of the Pamiri language group. The system combines a manually constructed lemma dictionary with morphological suffix rules to improve linguistic consistency in low-resource settings. The results demonstrate enhanced lemmatization accuracy and higher-quality embeddings for downstream NLP tasks. The work ...
Added: October 20, 2025
Исследования благополучия с помощью передовых методов обработки естественного языка (NLP): перспективы и ограничения
Voevodina E., Современная зарубежная психология 2025 Т. 14 № 3 С. 172–181
Context and relevance. Well-being research faces methodological limitations of conventional psychometric measures, criticized for poor ecological validity, limited information yield, and inadequate capture of multidimensional construct of well-being. Advanced natural language processing (NLP) technologies offer solutions to these constraints. Objective. To evaluate opportunities and challenges of transformer-based NLP for well-being research. Methods and materials. We conducted an analytical review of ...
Added: October 9, 2025
CLEF 2025 Working Notes
CEUR Workshop Proceedings, 2025.
Added: October 6, 2025
Разработка архитектуры классификатора для оценки состояния объектов инфраструктуры с применением нейронных сетей
Moiseev N., Абрамов И. А., Камакин А. Ю., В кн.: Параллельные вычислительные технологии – XIX всероссийская конференция с международным участием, ПаВТ'2025, г. Москва, 8–10 апреля 2025 г. Короткие статьи и описания плакатов.: Челябинск: Издательский центр ЮУрГУ, 2025. С. 301–301.
In recent years, with the advancement of deep learning and neural network methods, their application in geospatial analysis tasks has become particularly relevant. A key challenge in this field is assessing the state of urban infrastructure, including the classification of buildings by their functional purpose (residential, commercial, governmental, industrial). The use of neural networks significantly ...
Added: September 17, 2025
Comparative Analysis of Encoder-Based NER and Large Language Models for Skill Extraction from Russian Job Vacancies
Matkin N., Smirnov A., Usanin M. et al., , in: 12th International Conference, AIST 2024, Bishkek, Kyrgyzstan, October 17–19, 2024, Revised Selected Papers.: Cham: Springer, 2025.
The labor market is undergoing rapid changes, with increasing demands on job seekers and a surge in job openings. Identifying essential skills and competencies from job descriptions is challenging due to varying employer requirements and the omission of key skills. This study addresses these challenges by comparing traditional Named Entity Recognition (NER) methods based on ...
Added: July 26, 2025
2024 6th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA)
NY: IEEE, 2024.
The 6th International Conference on Control Systems Mathematical Modeling, Automation and Energy Efficiency (SUMMA2024), which is  traditionally jointly organized by V.A. Trapeznikon Institute of Control Sciences RAS and the Institute of Computer Sciences of Lipetsk State Technical University was held in Lipetsk on November, 13-15 2024. SUMMA2024 program includes topics of interest that consist of, ...
Added: April 5, 2025
The relationships between RedditSI and BTC exchange characteristics: Do Reddit users still control the market?
Baklanova V., Eurasian Economic Review 2025 Vol. 15 P. 285–306
This study investigates the influence of Reddit community on Bitcoin market performance by introducing the Reddit Sentiment Index (RedditSI) as a tool to measure sentiment among Reddit users. The index was crafted based on the Bitcoin-related subreddits and classified with the Flair NLP model. Statistical analysis, including correlation, cointegration and causality tests, revealed significant relationships ...
Added: March 14, 2025
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2024
Bangkok: Association for Computational Linguistics, 2024.
Originally named the Association for Machine Translation and Computational Linguistics (AMTCL), the Association for Computational Linguistics was founded in 1962 and renamed the ACL in 1968. The ACL is run by some 20 volunteers overseeing the administration of the Association (organising elections, deciding on new actions, adapting to the fast changing trends of our fields), ...
Added: February 21, 2025
Findings of the Association for Computational Linguistics: EACL 2024
Association for Computational Linguistics, 2024.
The 18th Conference of the European Chapter of the Association for Computational Linguistics. EACL is the flagship European conference dedicated to European and international researchers, covering a wide spectrum of research in Computational Linguistics and Natural Language Processing. ...
Added: February 17, 2025
Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques
Gorshkov S., Ignatov D. I., Chernysheva A. et al., IEEE Access 2025 Vol. 13 P. 962–979
Identifying potentially high-performing students is crucial for universities aiming to enhance educational outcomes, for companies seeking to recruit top talents early, and for advertising platforms looking to optimize targeted marketing. This paper introduces an algorithm designed to identify students with exceptional academic performance by analyzing their subscriptions to communities on the social network VKontakte. The ...
Added: January 3, 2025
Spot the Bot: Coarse-Grained Partition of Semantic Paths for Bots and Humans
Gromov V., Kogan A., , in: 10th International Conference, PReMI 2023, Kolkata, India, December 12–15, 2023, Proceedings. Pattern Recognition and Machine Intelligence. LNCS, volume 14301.: Cham: Springer, 2023. P. 348–355.
Nowadays, technology is rapidly advancing: bots are writing comments, articles, and reviews. Due to this fact, it is crucial to know if the text was written by a human or by a bot. This paper focuses on comparing structures of the coarse-grained partitions of semantic paths for human-written and bot-generated texts. We compare the clusterizations ...
Added: December 13, 2024
Toolken+: Improving LLM Tool Usage with Reranking and a Reject Option
Yakovlev K., Nikolenko S., Bout A., , in: Findings of the Association for Computational Linguistics: EMNLP 2024.: Association for Computational Linguistics, 2024. P. 5967–5974.
The recently proposed ToolkenGPT tool learning paradigm demonstrates promising performance but suffers from two major issues: first, it cannot benefit from tool documentation, and second, it often makes mistakes in whether to use a tool at all. We introduce Toolken+ that mitigates the first problem by reranking top-k tools selected by ToolkenGPT and the second ...
Added: November 22, 2024
Spot the Bot: the Inverse Problems of NLP
Vasilii A. Gromov, Quynh Nhu Dang, Alexandra S. Kogan et al., PeerJ Computer Science 2024 Vol. 10 Article e2550
This paper concerns the problem of distinguishing human-written and bot-generated texts. In contrast to the classical problem formulation, in which the focus falls on one type of bot only, we consider the problem of distinguishing texts written by any person from those generated by any bot; this involves analysing the large-scale, coarse-grained structure of the ...
Added: November 14, 2024
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit