Semantic Proximity Establishment in the Tasks of Knowledge Extraction and Named Entities Recognition

Kozerenko E. B.; Kuznetsov K. I.; Morozova Y. I.; D. A. Romanov

?

Semantic Proximity Establishment in the Tasks of Knowledge Extraction and Named Entities Recognition

P. 339–344.

Kozerenko E. B., Kuznetsov K. I., Morozova Y. I., Romanov D. A.

The paper deals with the problem of establishing text segments containing the similar semantic units for the tasks of analytical text processing within the semantic technology platform. The methods and instruments presented in the paper provide the discovery of relevant content based on users' focused interests within a certain domain. The hybrid approach comprising linguistic rules and example-based learning techniques is employed. The legal and mass media texts are considered. In this paper a brief description of the NER task history is cited, the Pullenti-based engine is specified, the two-step Semantic Expansion Algorithm is presented, the Distributional Semantics methods for domain terms extraction are discussed as well as some technical challenges and the prospective directions of further research and development.

Language: English

Full text

Text on another site

Keywords: Knowledge Extraction named entities recognition semantic clustering semantic similarity

In book

PROCEEDINGS OFTHE 2017 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE

American Council on Science & Education, 2017.

Aschern at CheckThat! 2021: Lambda-Calculus of Fact-Checked Claims

Chernyavskiy A., Ilvovsky D., Nakov P., , in: CLEF 2021 Working Notes.: CEUR Workshop Proceedings, 2021. P. 484–493.

We describe our system for the CLEF 2021 CheckThat! Lab Task 2 Subtask A on detecting previously fact-checked claims. We developed a pipeline using TF.IDF, sentence-BERT fine-tuned on the training data, and reranking using LambdaMART and the predicted similarity scores and positions in the ranked list as features. We examined the quality of each model ...

Added: May 9, 2024

Semantic Recommendation System for Bilingual Corpus of Academic Papers

Safaryan A., Petr Filchenkov, Yan W. et al., , in: Recent Trends in Analysis of Images, Social Networks and Texts. 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020 Revised Supplementary ProceedingsVol. 12602.: Springer, 2021. Ch. 3 P. 22–36.

We tested four methods of making document representations cross-lingual for the task of semantic search for the similar papers based on the corpus of papers from three Russian conferences on NLP: Dialogue, AIST and AINL. The pipeline consisted of three stages: preprocessing, word-by-word vectorisation using models obtained with various methods to map vectors from two ...

Added: September 18, 2023

Moving Other Way: Exploring Word Mover Distance Extensions

Smirnov, I., Yamshchikov I. P., , in: COMPLEXIS 2022. Proceedings of the 7th International Conference on Complexity, Future Information Systems and Risk. April 23-24, 2022.: Science and Technology Publications, Lda, 2022. P. 92–97.

Added: September 8, 2022

Analyzing COVID-19 Medical Papers Using Artificial Intelligence: Insights for Researchers and Medical Professionals

Dmitry Soshnikov, Petrova T., Soshnikova V. et al., Big Data and Cognitive Computing 2022 Vol. 6 No. 1 Article 4

Since the beginning of the COVID-19 pandemic almost two years ago, there have been more than 700,000 scientific papers published on the subject. An individual researcher cannot possibly get acquainted with such a huge text corpus and, therefore, some help from artificial intelligence (AI) is highly needed. We propose the AI-based tool to help researchers ...

Added: February 22, 2022

Chekhov's Gun Recognition

Tikhonov A., Yamshchikov I. P., / Series Computer Science "arxiv.org". 2021.

Chekhov's gun is a dramatic principle stating that every element in a story must be necessary, and irrelevant elements should be removed. This paper presents a new natural language processing task — Chekhov's gun recognition or (CGR) — recognition of entities that are pivotal for the development of the plot. Though similar to classical Named Entity Recognition ...

Added: December 3, 2021

Rethinking Crowd Sourcing for Semantic Similarity

Solomon S., Cohn A., Rosenblum H. et al., / Series Computer Science "arxiv.org". 2021.

Estimation of semantic similarity is crucial for a variety of natural language processing (NLP) tasks. In the absence of a general theory of semantic information, many papers rely on human annotators as the source of ground truth for semantic similarity estimation. This paper investigates the ambiguities inherent in crowd-sourced semantic labeling. It shows that annotators ...

Added: December 3, 2021

Lexicographic Study of Synonymy: Clarifying Semantic Similarity between Words

Solovyev V., Гималетдинова Г., Халитова Л. et al., Computacion y Sistemas 2021 Vol. 25 No. 3 P. 667–675

The problem of determining semantic similarity between words affects the understanding of synonymy 13 and creates obstacles to the work of lexicographers. The study was carried out as a part of a larger 14 research project on expert assessment of synonymic rows in RuWordNet thesaurus (a WordNet–like 15 thesaurus for the Russian language). The aim ...

Added: December 1, 2021

Representation of Different Types of Adjectival Polysemy in the Mental Lexicon

Apresyan V., Lopukhina A., Zarifyan M., Frontiers in Psychology 2021 Vol. 12 Article 742064

We studied mental representations of literal, metonymically different, and metaphorical senses in Russian adjectives. Previous studies suggested that in polysemous words, metonymic senses, being more sense-related, were stored together with literal senses, whereas more distant metaphorical senses had separate representations. We hypothesized that metonymy may be heterogeneous with respect to its mental storage. “Whole-part” metonymy ...

Added: October 29, 2021

Style-transfer and Paraphrase: Looking for a Sensible Semantic Similarity Metric

Yamshchikov I. P., Shibaev V., Khlebnikov N. et al., , in: The Thirty-Fifth AAAI Conference on Artificial Intelligence. Technical Tracks 16Vol. 35. Issue 16.: AAAI Press, 2021. P. 14213–14220.

The rapid development of such natural language processing tasks as style transfer, paraphrase, and machine translation often calls for the use of semantic similarity metrics. In recent years a lot of methods to measure the semantic similarity of two short texts were developed. This paper provides a comprehensive analysis for more than a dozen of ...

Added: July 22, 2021

Извлечение сценарной информации из текстов. Часть 1. Постановка задачи и обзор методов

Суворова М. И., Кобозева М. В., Toldova S. et al., Искусственный интеллект и принятие решений 2020 № 1 С. 17–26

В статье обсуждается важность автоматического сценарного анализа для понимания текстов на естественном языке. Дан широкий обзор методов и подходов к описанию и извлечению сценариев. Рассмотрены теоретические подходы к формализации сценариев. Приведен список задач, для решения которых используется информация о сценарной структуре текста. Представлены популярные подходы к автоматическому извлечению сценариев из текстов и методы оценки их ...

Added: April 22, 2020

The Entity Name Identification in Classification Algorithm: Testing the Advocacy Coalition Framework by Document Analysis (The Case of Russian Civil Society Policy)

Zaytsev D., Talovsky N., Kuskova V. et al., , in: Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Lecture Notes in Computer Science, Revised Selected PapersVol. 11832.: Cham: Springer, 2019. P. 276–288.

This is an application of an advanced entity recognition algorithm to a large dataset. ...

Added: November 7, 2019

Network Analysis Methodology of Policy Actors Identification and Power Evaluation (the case of the Unified State Exam introduction in Russia)

Zaytsev D., Gregory Khvatsky, Talovsky N. et al., , in: Network Algorithms, Data Mining, and Applications. Springer Proceedings in Mathematics & Statistics.: Springer, 2020. P. 231–244.

This is an exploratory study of the effects of the Unified State Exam in Russia, using advanced network methodology. ...

Added: November 7, 2019

An Experimental Study of Hybrid Machine Learning Models for Extracting Named Entities

Lei J., Bolshakova E. I., , in: Proceedings of Third Workshop "Computational linguistics and language science"Issue 4.: Manchester: EasyChair, 2019. P. 50–60.

The paper describes two hybrid neural network models for named entity recognition (NER) in texts, namely Bi-LSTM-CRF and Gated-CNN-CRF, as well as results of experiments with them. ...

Added: November 3, 2019

Dark personalities on Facebook: Harmful online behaviors and language

Bogolyubova O., Panicheva P., Tikhonov R. et al., Computers in Human Behavior 2018 Vol. 78 P. 151–159

*Реализация соц. сети Facebook запрещена на территории России по основаниям осуществления экстремистской деятельности. The goal of this paper was to assess the connection between dark personality traits and engagement in harmful online behaviors in a sample of Russian Facebook users, and to describe the language they use in online communication. A total of 6724 individuals participated ...

Added: February 18, 2019

СЕМАНТИЧЕСКАЯ ОБРАБОТКА НЕСТРУКТУРИРОВАННЫХ ТЕКСТОВЫХ ДАННЫХ НА ОСНОВЕ ЛИНГВИСТИЧЕСКОГО ПРОЦЕССОРА PULLENTI

Козеренко Е. Б., Кузнецов К. И., Romanov D. A., Информатика и ее применения 2018 Т. 12 № 3 С. 91–98

The paper presents the method for creation of knowledge extraction systems based on the approach employing the software tool system PullEnti comprising the algorithms for morphological and semantic-syntactical analysis which makes it possible to extract entities of certain types from natural language texts (persons, organizations, locations, and other target semantic objects). The PullEnti system uses ...

Added: December 19, 2018

Trend Monitoring for Linking Science and Strategy

Bakhtin P. D., Saritas O., Chulok A. et al., Scientometrics 2017 Vol. 111 No. 3 P. 2059–2075

Rapid changes in Science & Technology (S&T) along with breakthroughs in products and services concern a great deal of policy and strategy makers and lead to an ever increasing number of Foresight and other types of forward-looking work. At the outset, the purpose of these efforts is to investigate emerging S&T areas, set priorities and ...

Added: December 21, 2016

Unified External Data Access Implementation in Formal Concept Analysis Research Toolbox

Parinov A., Neznanov A., , in: CLA 2016: Proceedings of the Thirteenth International Conference on Concept Lattices and Their Applications. CEUR Workshop ProceedingsVol. 1624.: M.: Higher School of Economics, National Research University, 2016. P. 285–296.

Formal Concept Analysis (FCA) provides mathematical models, methods and algorithms for data analysis. However, by now there is no easily available program system, which would provide data analyst with unified, intelligible and transparent access to various external data sources with large amount of heterogeneous data for subsequent FCA-based knowledge discovery. The lack of such tools ...

Added: October 19, 2016

Full-text Search in Intermediate Data Storage of FCART

Neznanov A., Parinov A., , in: RuZA 2015 Workshop. Proceedings of Russian and South African Workshop on Knowledge Discovery Techniques Based on Formal Concept Analysis (RuZA 2015). November 30 - December 5, 2015, Stellenbosch, South AfricaVol. 1552.: Aachen: CEUR Workshop Proceedings, 2015.

The speed of full-text search directly affects the process of text analysis. Search engine creates a text index, which is used for fast full-text search. Solr and ElasticSearch are two popular search engines. A text analysis system requires fast implementing searching and indexing at the same time. This paper describes preprocessing workflow of the analysis ...

Added: June 14, 2016

Semantic Clustering of Russian Web Search Results: Possibilities and Problems

Kutuzov A. B., , in: Information Retrieval. 9th Russian Summer School, RuSSIR 2015, Saint Petersburg, Russia, August 24-28, 2015, Revised Selected PapersVol. 573.: Switzerland: Springer, 2016. Ch. 6 P. 320–331.

The present paper deals with word sense induction from lexical co-occurrence graphs. We construct such graphs on large Russian corpora and then apply the data to cluster the results of Mail.ru search according to meanings in the query. We compare different methods of performing such clustering and different source corpora. Models of applying distributional semantics ...

Added: December 25, 2015