• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • Методы и средства извлечения терминов из текстов для терминологических задач
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 12, 2026
‘Any Real-Economy Company Can Use Our Products
The HSE Centre for Financial Research and Data Analytics combines fundamental and applied work, including in areas unique to Russia such as the connection between sentiment in the media and social networks and financial markets. The HSE News Service spoke with the centre’s director, Professor Tamara Teplova, about its work.
May 7, 2026
Researchers Find More Effective Approach to Revealing Majorana Zero Modes in Superconductors
An international team of researchers, including physicists from HSE MIEM, has demonstrated that nonmagnetic impurities can help more accurately reveal Majorana zero modes—quantum states considered promising building blocks for quantum computing. The researchers found that these impurities shift the energy levels that typically obscure the Majorana signal, while leaving the mode itself largely unaffected, thereby making its spectral peak more distinct. The study has been published in Research.
May 6, 2026
The Future of Cardiogenetics Lies in Artificial Intelligence
Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a program capable of analysing regions of the human genome that were previously inaccessible for accurate interpretation in genetic testing. The program adapts large generative AI (GenAI) models for cardiogenetics to predict how specific mutations affect the function of individual genes.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Методы и средства извлечения терминов из текстов для терминологических задач

Программные продукты и системы. 2025. Т. 38. № 1. С. 5–16.
Bolshakova E. I., Семак В. В.

The current state in the field of automatic term extraction from specialized natural language texts, including scientific and technical documents, is considered. Practical applications of methods and tools for extracting terms from texts include creation of terminological dictionaries, thesauri, and glossaries of problem oriented domains, as well as extraction of keywords and construction of subject indexes for highly specialized documents.

The paper provides an overview of approaches to automatic recognition and extraction of terminological words and phrases, which cover traditional statistical methods, as well as methods based on machine learning, including learning by term features and learning using modern neural network transformer-based language models. A comparison of approaches is given, including quality assessments for term recognition and term extraction, and the most well-known software tools for automating term extraction within the statistical approach and learning by features are indicated.

The studies conducted by the authors on term recognition based on neural network language models are described, being applied to processing Russian scientific texts on mathematics and programming. The data set with terminological annotations created for training term recognition models is briefly characterized, which covers the data from seven related domains. The models were developed on the basis of pre-trained neural network model BERT, with its additional training (fine-tuning) in two ways: as a binary classifier of candidate terms (previously extracted from texts) and as a classifier for sequential labeling terminological words in texts. For the developed models, the quality of term recognition is experimentally evaluated, and a comparison with statistical method was carried out. The best quality is demonstrated by binary classification models, significantly surpassing the other approaches considered. The experiments also show the applicability of the trained models to texts in a related scientific field.

Language: Russian
DOI
Keywords: автоматическая обработка естественного языкаautomatic term extractionававтоматическое извлечение терминовNatural Language Processingmachine learning for term recognitionмашинное обучение для распознавания терминов
Similar publications
FinTech and the green transition: Exploring pathways to ignite innovation for carbon neutrality in global supply chains
Yalcin H., Demirhan D., Aracioglu B. et al., Technology in Society 2026 Vol. 84 Article 103094
This article comprehensively evaluates the critical role of FinTech in promoting carbon neutrality and green logistics practices in global supply chains. In our study, using bibliometric analysis, social network analysis and natural language processing (NLP) methods, we evaluate the potential of FinTech innovations to increase traceability, transparency and efficiency in supply chain processes. In this ...
Added: March 11, 2026
Дискриминативная лемматизация сокращений в эпоху LLM
Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155
This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...
Added: March 10, 2026
30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I. Natural Language Processing and Information Systems. (LNCS, volume 15836)
Springer, 2025.
The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...
Added: February 3, 2026
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Association for Computational Linguistics, 2025.
The book contains this year’s edition of the Conference on Empirical Methods in Natural Language Processing! Importantly, it marks the 30th edition of EMNLP. With over 8,000 submissions, more than 3,000 accepted papers, and thousands of attendees, we have come a long way from that first workshop, which had 14 accepted papers. As the field looks ahead, Suzhou ...
Added: November 16, 2025
Автоматическая саммаризация родительских чатов в WhatsApp
Dmitrieva K., Жолус М. Р., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2025 Т. 23 № 1 С. 80–92
Automatic text summarization is one of the main tasks of natural language processing (NLP), which consists in creating a shorter version of the source text. In today’s world the amount of information consumed by people is constantly increasing, therefore more and more emphasis is being placed on the task of summarization. There are two main approaches ...
Added: July 8, 2025
Analysis of Images, Social Networks and Texts, 12th International Conference, AIST 2024, Bishkek, Kyrgyzstan, October 17–19, 2024, Revised Selected Papers
Springer, 2024.
This book constitutes the refereed proceedings of the 12th International Conference on Analysis of Images, Social Networks and Texts, AIST 2024, held in Bishkek, Kyrgyzstan, during October 17–19, 2024. The 16 full papers included in this book were carefully reviewed and selected from 70 submissions. They were organized in topical sections as follows: Natural Language Processing; Computer Vision; Data Analysis and Machine Learning; ...
Added: May 29, 2025
Knowledge Discovery, Knowledge Engineering and Knowledge Management: 15th International Joint Conference, IC3K 2023, Rome, Italy, November 13-15, 2023, Revised Selected Papers
Rome: Springer, 2025.
This book constitutes the refereed proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2023, held in Rome, Italy, during November 13-15, 2023. The 9 full papers and 8 short papers included in this book were carefully reviewed and selected from 166 submissions. They were organized in topical sections ...
Added: May 2, 2025
An experimental rule-based parser for Russian employing the NLP resources of the ETAP system
Inshakova E.S., Sizov V. G., , in: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2020"Issue 19 (26).: ., 2020.
Added: April 10, 2025
Automation of Forensic Authorship Attribution: Problems and Prospects
Romanova T. V., Khomenko A., Legal Issues in the Digital Age 2022 Vol. 3 No. 2 P. 90–115
The article deals with validation of an integrative attribution algorithm based on the analysis of the author’s idiostyle using methods of interpretative linguistics with ob jectification of the available data with the help of mathematical statistics. The algo rithm addresses the identification problem of the attribution. The choice of parameters describing the individual style of ...
Added: March 12, 2025
Proceedings of the 28th Conference on Computational Natural Language Learning
Association for Computational Linguistics, 2024.
CoNLL is a conference organized yearly by SIGNLL (ACL’s Special Interest Group on Natural Language Learning), focusing on theoretically, cognitively and scientifically motivated approaches to computational linguistics. This year, CoNLL was held alongside EMNLP 2024. ...
Added: March 11, 2025
Big Data Analytics Approach with Multiple Text Types: The Case of the Computer Gaming
Aleksandr Belov, Zakharov F., Litvinenko E. et al., , in: International IoT, Electronics and Mechatronics Conference, Volume 2. Proceedings of IEMTRONICS 2024. LNEE, volume 1228Vol. 1228.: Springer Publishing Company, 2025. P. 275–287.
Added: January 26, 2025
Automatic Morpheme Segmentation for Russian: Can an Algorithm Replace Experts?
Morozov D., Garipov T., Lyashevskaya O. et al., Journal of Language and Education 2024 Vol. 10 No. 4 P. 71–84
Introduction: Numerous algorithms have been proposed for the task of automatic morpheme segmentation of Russian words. Due to the differences in task formulation and datasets utilized, comparing the quality of these algorithms is challenging. It is unclear whether the errors in the models are due to the ineffectiveness of algorithms themselves or to errors and inconsistencies ...
Added: January 7, 2025
Threatening Expression and Target Identification in Under-Resource Languages Using NLP Techniques
Malik M. S., Lecture Notes in Computer Science 2024 Vol. 14486 P. 3–17
In recent decades, hate speech on social media platforms has been on the rise. It is highly desired to control this kind of material because it initiates unrest and harms to the society. Literature describes several forms of the hate speech and it is quite challenging to differentiate between these forms and to design an automated detection system, especially ...
Added: December 12, 2024
Document Classification via Stable Graph Patterns and Conceptual AMR Graphs
Parakal E. G., Dudyrev E., Sergei O. Kuznetsov et al., Lecture Notes in Computer Science 2024 Vol. 14914 P. 286–301
This paper proposes an approach and an associated system based on pattern structures, aimed at the classification of documents represented as graphs. The representation of documents relies on Abstract Meaning Representation (AMR) document graphs. Given a set of AMR document graphs, the system learns characteristic graph patterns, that can be reused by an aggregate rule classifier to predict the class ...
Added: September 10, 2024
Think about what you’ve learned: анализ тональности для моделирования пользовательского опыта в сфере онлайн-образования
Kirina M., Человек: образ и сущность. Гуманитарные аспекты 2024 № 2(58) С. 176–204
The article focuses on the application of opinion mining techniques to evaluate user experience on the Hyperskill educational platform, using Python, Java, and Kotlin programming projects as the basis of analysis. The study utilizes sentiment analysis and keyword extraction methods to gauge users' attitudes towards the platform, learning process, and topics covered. To achieve this, ...
Added: December 9, 2023
Disambiguation in context in the Russian National Corpus: 20 yeas later
Lyashevskaya O., Afanasev I., Stefan Rebrikov et al., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Вып. 22.Вып. 22.: [б.и.], 2023. P. 307–318.
An updated annotation of the Main, Media, and some other corpora of the Russian National Corpus (RNC) features the part-of-speech and other morphological information, lemmas, dependency structures, and constituency types. Transformer-based architectures are used to resolve the homonymy in context according to a schema based on the manually disambiguated subcorpus of the Main corpus (morphology ...
Added: September 15, 2023
Identifying and Visualizing Trends in Science, Technology, and Innovation Using SciBERT
Lobanova P., Bakhtin P., Sergienko Y., IEEE Transactions on Engineering Management 2024 No. 71 P. 11898–11906
Identification of science, technology, and innovation trends is a critical topic both for the scientific community and for companies that develop technologies, work on science and technology policy or invest in high tech. In this research authors demonstrate a novel approach implemented in iFORA system (developed by National Research University Higher School of Economics) using ...
Added: September 8, 2023
The Use of Khislavichi Lect Morphological Tagging to Determine its Position in the East Slavic Group
Afanasev I., , in: Proceedings of Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023).: Association for Computational Linguistics, 2023. P. 174–186.
The study of low-resourced East Slavic lects is becoming increasingly relevant as they face the prospect of extinction under the pressure of standard Russian while being treated by academia as an inferior part of this lect. The Khislavichi lect, spoken in a settlement on the border of Russia and Belarus, is a perfect example of ...
Added: May 15, 2023
Proceedings of Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)
Association for Computational Linguistics, 2023.
These proceedings include the 23 papers presented at the 10th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), co-located with the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Both EACL and VarDial were held in Dubrovnik, Croatia, in a hybrid format, allowing participants to attend on-site or ...
Added: May 15, 2023
Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022)
Marseille: European Language Resources Association (ELRA), 2022.
The proceedings are organised on the basis of the 22 Tracks of the Conference on Language Resources and Evaluation (LREC) held in Marseille, France, from 20 to 25 June 2022. Major topics include corpora and annotation (including tools, systems, treebanks), information extraction and information retrieval (including ner, qa, text mining, document classification, text categorisation), applications involving lrs and evaluation (including ...
Added: February 22, 2023
Автоматическая оценка впечатлений обучающихся методами анализа тональности (на материале отзывов на онлайн-курсы на русском и английском)
Kirina M., Тельнина Л. Д., В кн.: Цифровая гуманитаристика и технологии в образовании (DHTE 2022): сб. статей III Всероссийской научно-практической конференции с международным участием. 17—18 ноября 2022 г.: ФГБОУ ВО МГППУ, 2022. С. 355–374.
В статье описывается эксперимент, направленный на сравнение эффективности инструментов анализа тональности для оценки пользовательского опыта на материале публичных отзывов на онлайнкурсы на образовательной платформе Stepik. Рассматриваются результаты автоматического извлечения сентимент-оценок пользователей на соответствующие курсы как на русском, так и на английском языках. Для русскоязычных текстов обсуждается применение словаря эмотивной лексики «КартаСловСент» и предобученной на датасете ...
Added: December 9, 2022
A hybrid lemmatiser for Old Church Slavonic
Afanasev I., / NRU HSE. Series WP BRP "Linguistics". 2021.
The article considers a lemmatiser that is developed specifically for Old Church Slavonic (OCS). The introduction underlines the problem of the lack of lemmatisers that might deal with different datasets of the OCS. The review gives a short description of previous attempts and current trends in lemmatisation. The lemmatiser is hybrid-based and uses the advantages ...
Added: December 28, 2021
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit