• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Semantic Proximity Establishment in the Tasks of Knowledge Extraction and Named Entities Recognition
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Semantic Proximity Establishment in the Tasks of Knowledge Extraction and Named Entities Recognition

P. 339–344.
Kozerenko E. B., Kuznetsov K. I., Morozova Y. I., Romanov D. A.

The paper deals with the problem of establishing text segments containing the similar semantic units for the tasks of analytical text processing within the semantic technology platform. The methods and instruments presented in the paper provide the discovery of relevant content based on users' focused interests within a certain domain. The hybrid approach comprising linguistic rules and example-based learning techniques is employed. The legal and mass media texts are considered. In this paper a brief description of the NER task history is cited, the Pullenti-based engine is specified, the two-step Semantic Expansion Algorithm is presented, the Distributional Semantics methods for domain terms extraction are discussed as well as some technical challenges and the prospective directions of further research and development.

Language: English
Full text
Text on another site
Keywords: Knowledge Extractionnamed entities recognitionsemantic clusteringsemantic similarity

In book

PROCEEDINGS OFTHE 2017 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE
PROCEEDINGS OFTHE 2017 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE
American Council on Science & Education, 2017.
Similar publications
Aschern at CheckThat! 2021: Lambda-Calculus of Fact-Checked Claims
Chernyavskiy A., Ilvovsky D., Nakov P., , in: CLEF 2021 Working Notes.: CEUR Workshop Proceedings, 2021. P. 484–493.
We describe our system for the CLEF 2021 CheckThat! Lab Task 2 Subtask A on detecting previously fact-checked claims. We developed a pipeline using TF.IDF, sentence-BERT fine-tuned on the training data, and reranking using LambdaMART and the predicted similarity scores and positions in the ranked list as features. We examined the quality of each model ...
Added: May 9, 2024
Semantic Recommendation System for Bilingual Corpus of Academic Papers
Safaryan A., Petr Filchenkov, Yan W. et al., , in: Recent Trends in Analysis of Images, Social Networks and Texts. 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020 Revised Supplementary ProceedingsVol. 12602.: Springer, 2021. Ch. 3 P. 22–36.
We tested four methods of making document representations cross-lingual for the task of semantic search for the similar papers based on the corpus of papers from three Russian conferences on NLP: Dialogue, AIST and AINL. The pipeline consisted of three stages: preprocessing, word-by-word vectorisation using models obtained with various methods to map vectors from two ...
Added: September 18, 2023
Moving Other Way: Exploring Word Mover Distance Extensions
Smirnov, I., Yamshchikov I. P., , in: COMPLEXIS 2022. Proceedings of the 7th International Conference on Complexity, Future Information Systems and Risk. April 23-24, 2022.: Science and Technology Publications, Lda, 2022. P. 92–97.
Added: September 8, 2022
Analyzing COVID-19 Medical Papers Using Artificial Intelligence: Insights for Researchers and Medical Professionals
Dmitry Soshnikov, Petrova T., Soshnikova V. et al., Big Data and Cognitive Computing 2022 Vol. 6 No. 1 Article 4
Since the beginning of the COVID-19 pandemic almost two years ago, there have been more than 700,000 scientific papers published on the subject. An individual researcher cannot possibly get acquainted with such a huge text corpus and, therefore, some help from artificial intelligence (AI) is highly needed. We propose the AI-based tool to help researchers ...
Added: February 22, 2022
Chekhov's Gun Recognition
Tikhonov A., Yamshchikov I. P., / Series Computer Science "arxiv.org". 2021.
Chekhov's gun is a dramatic principle stating that every element in a story must be necessary, and irrelevant elements should be removed. This paper presents a new natural language processing task — Chekhov's gun recognition or (CGR) — recognition of entities that are pivotal for the development of the plot. Though similar to classical Named Entity Recognition ...
Added: December 3, 2021
Rethinking Crowd Sourcing for Semantic Similarity
Solomon S., Cohn A., Rosenblum H. et al., / Series Computer Science "arxiv.org". 2021.
Estimation of semantic similarity is crucial for a variety of natural language processing (NLP) tasks. In the absence of a general theory of semantic information, many papers rely on human annotators as the source of ground truth for semantic similarity estimation. This paper investigates the ambiguities inherent in crowd-sourced semantic labeling. It shows that annotators ...
Added: December 3, 2021
Lexicographic Study of Synonymy: Clarifying Semantic Similarity between Words
Solovyev V., Гималетдинова Г., Халитова Л. et al., Computacion y Sistemas 2021 Vol. 25 No. 3 P. 667–675
The problem of determining semantic similarity between words affects the understanding of synonymy 13 and creates obstacles to the work of lexicographers. The study was carried out as a part of a larger 14 research project on expert assessment of synonymic rows in RuWordNet thesaurus (a WordNet–like 15 thesaurus for the Russian language). The aim ...
Added: December 1, 2021
Representation of Different Types of Adjectival Polysemy in the Mental Lexicon
Apresyan V., Lopukhina A., Zarifyan M., Frontiers in Psychology 2021 Vol. 12 Article 742064
We studied mental representations of literal, metonymically different, and metaphorical senses in Russian adjectives. Previous studies suggested that in polysemous words, metonymic senses, being more sense-related, were stored together with literal senses, whereas more distant metaphorical senses had separate representations. We hypothesized that metonymy may be heterogeneous with respect to its mental storage. “Whole-part” metonymy ...
Added: October 29, 2021
Style-transfer and Paraphrase: Looking for a Sensible Semantic Similarity Metric
Yamshchikov I. P., Shibaev V., Khlebnikov N. et al., , in: The Thirty-Fifth AAAI Conference on Artificial Intelligence. Technical Tracks 16Vol. 35. Issue 16.: AAAI Press, 2021. P. 14213–14220.
The rapid development of such natural language processing tasks as style transfer, paraphrase, and machine translation often calls for the use of semantic similarity metrics. In recent years a lot of methods to measure the semantic similarity of two short texts were developed. This paper provides a comprehensive analysis for more than a dozen of ...
Added: July 22, 2021
Извлечение сценарной информации из текстов. Часть 1. Постановка задачи и обзор методов
Суворова М. И., Кобозева М. В., Toldova S. et al., Искусственный интеллект и принятие решений 2020 № 1 С. 17–26
В статье обсуждается важность автоматического сценарного анализа для понимания текстов на естественном языке. Дан широкий обзор методов и подходов к описанию и извлечению сценариев. Рассмотрены теоретические подходы к формализации сценариев. Приведен список задач, для решения которых используется информация о сценарной структуре текста. Представлены популярные подходы к автоматическому извлечению сценариев из текстов и методы оценки их ...
Added: April 22, 2020
The Entity Name Identification in Classification Algorithm: Testing the Advocacy Coalition Framework by Document Analysis (The Case of Russian Civil Society Policy)
Zaytsev D., Talovsky N., Kuskova V. et al., , in: Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Lecture Notes in Computer Science, Revised Selected PapersVol. 11832.: Cham: Springer, 2019. P. 276–288.
This is an application of an advanced entity recognition algorithm to a large dataset. ...
Added: November 7, 2019
Network Analysis Methodology of Policy Actors Identification and Power Evaluation (the case of the Unified State Exam introduction in Russia)
Zaytsev D., Gregory Khvatsky, Talovsky N. et al., , in: Network Algorithms, Data Mining, and Applications. Springer Proceedings in Mathematics & Statistics.: Springer, 2020. P. 231–244.
This is an exploratory study of the effects of the Unified State Exam in Russia, using advanced network methodology. ...
Added: November 7, 2019
An Experimental Study of Hybrid Machine Learning Models for Extracting Named Entities
Lei J., Bolshakova E. I., , in: Proceedings of Third Workshop "Computational linguistics and language science"Issue 4.: Manchester: EasyChair, 2019. P. 50–60.
The paper describes two hybrid neural network models for named entity recognition (NER) in texts, namely Bi-LSTM-CRF and Gated-CNN-CRF, as well as results of experiments with them. ...
Added: November 3, 2019
Dark personalities on Facebook: Harmful online behaviors and language
Bogolyubova O., Panicheva P., Tikhonov R. et al., Computers in Human Behavior 2018 Vol. 78 P. 151–159
*Реализация соц. сети Facebook запрещена на территории России по основаниям осуществления экстремистской деятельности. The goal of this paper was to assess the connection between dark personality traits and engagement in harmful online behaviors in a sample of Russian Facebook users, and to describe the language they use in online communication. A total of 6724 individuals participated ...
Added: February 18, 2019
СЕМАНТИЧЕСКАЯ ОБРАБОТКА НЕСТРУКТУРИРОВАННЫХ ТЕКСТОВЫХ ДАННЫХ НА ОСНОВЕ ЛИНГВИСТИЧЕСКОГО ПРОЦЕССОРА PULLENTI
Козеренко Е. Б., Кузнецов К. И., Romanov D. A., Информатика и ее применения 2018 Т. 12 № 3 С. 91–98
The paper presents the method for creation of knowledge extraction systems based on the approach employing the software tool system PullEnti comprising the algorithms for morphological and semantic-syntactical analysis which makes it possible to extract entities of certain types from natural language texts (persons, organizations, locations, and other target semantic objects). The PullEnti system uses ...
Added: December 19, 2018
Trend Monitoring for Linking Science and Strategy
Bakhtin P. D., Saritas O., Chulok A. et al., Scientometrics 2017 Vol. 111 No. 3 P. 2059–2075
Rapid changes in Science & Technology (S&T) along with breakthroughs in products and services concern a great deal of policy and strategy makers and lead to an ever increasing number of Foresight and other types of forward-looking work. At the outset, the purpose of these efforts is to investigate emerging S&T areas, set priorities and ...
Added: December 21, 2016
Unified External Data Access Implementation in Formal Concept Analysis Research Toolbox
Parinov A., Neznanov A., , in: CLA 2016: Proceedings of the Thirteenth International Conference on Concept Lattices and Their Applications. CEUR Workshop ProceedingsVol. 1624.: M.: Higher School of Economics, National Research University, 2016. P. 285–296.
Formal Concept Analysis (FCA) provides mathematical models, methods and algorithms for data analysis. However, by now there is no easily available program system, which would provide data analyst with unified, intelligible and transparent access to various external data sources with large amount of heterogeneous data for subsequent FCA-based knowledge discovery. The lack of such tools ...
Added: October 19, 2016
Full-text Search in Intermediate Data Storage of FCART
Neznanov A., Parinov A., , in: RuZA 2015 Workshop. Proceedings of Russian and South African Workshop on Knowledge Discovery Techniques Based on Formal Concept Analysis (RuZA 2015). November 30 - December 5, 2015, Stellenbosch, South AfricaVol. 1552.: Aachen: CEUR Workshop Proceedings, 2015.
The speed of full-text search directly affects the process of text analysis. Search engine creates a text index, which is used for fast full-text search. Solr and ElasticSearch are two popular search engines. A text analysis system requires fast implementing searching and indexing at the same time. This paper describes preprocessing workflow of the analysis ...
Added: June 14, 2016
Semantic Clustering of Russian Web Search Results: Possibilities and Problems
Kutuzov A. B., , in: Information Retrieval. 9th Russian Summer School, RuSSIR 2015, Saint Petersburg, Russia, August 24-28, 2015, Revised Selected PapersVol. 573.: Switzerland: Springer, 2016. Ch. 6 P. 320–331.
The present paper deals with word sense induction from lexical co-occurrence graphs. We construct such graphs on large Russian corpora and then apply the data to cluster the results of Mail.ru search according to meanings in the query. We compare different methods of performing such clustering and different source corpora. Models of applying distributional semantics ...
Added: December 25, 2015
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit