Semantic Clustering of Russian Web Search Results: Possibilities and Problems

A. B. Kutuzov

?

Semantic Clustering of Russian Web Search Results: Possibilities and Problems

Ch. 6. P. 320–331.

Kutuzov A. B.

The present paper deals with word sense induction from lexical co-occurrence graphs. We construct such graphs on large Russian corpora and then apply the data to cluster the results of Mail.ru search according to meanings in the query. We compare different methods of performing such clustering and different source corpora. Models of applying distributional semantics to big linguistic data are described.

Language: English

Full text

Text on another site

Keywords: graph analysis distributional semantics information retrieval semantic clustering

In book

Information Retrieval. 9th Russian Summer School, RuSSIR 2015, Saint Petersburg, Russia, August 24-28, 2015, Revised Selected Papers

Vol. 573. , Switzerland: Springer, 2016.

Dark personalities on Facebook: Harmful online behaviors and language

Bogolyubova O., Panicheva P., Tikhonov R. et al., Computers in Human Behavior 2018 Vol. 78 P. 151–159

*Реализация соц. сети Facebook запрещена на территории России по основаниям осуществления экстремистской деятельности. The goal of this paper was to assess the connection between dark personality traits and engagement in harmful online behaviors in a sample of Russian Facebook users, and to describe the language they use in online communication. A total of 6724 individuals participated ...

Added: February 18, 2019

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Association for Computing Machinery (ACM), 2022.

Added: July 8, 2022

HCI International 2023 Posters

Springer, 2023.

Added: October 21, 2023

Automatic construction of lexical typological questionnaires

Paperno D., Ryzhova D., , in: Methodological Tools for Linguistic Description and TypologyIssue 16. University of Hawaii Press, 2019. Ch. 5 P. 45–61.

Questionnaires constitute a crucial tool in linguistic typology and language description. By nature, a Questionnaire is both an instrument and a result of typological work: its purpose is to help the study of a particular phenomenon cross-linguistically or in a particular language, but the creation of a Questionnaire is in its turn based on the ...

Added: August 30, 2019

9th Russian Summer School in Information Retrieval (RuSSIR 2015)

Braslavski P. undefined., Markov I., Pardalos P. M. et al., ACM SIGIR Forum 2016 Vol. 49 No. 2 P. 72–79

This paper provides the reader with a report on 9th Russian Summer School in Information Retrieval (RuSSIR 2015). ...

Added: February 27, 2017

Semantic Proximity Establishment in the Tasks of Knowledge Extraction and Named Entities Recognition

Kozerenko E. B., Kuznetsov K. I., Morozova Y. I. et al., , in: PROCEEDINGS OFTHE 2017 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE. American Council on Science & Education, 2017. P. 339–344.

The paper deals with the problem of establishing text segments containing the similar semantic units for the tasks of analytical text processing within the semantic technology platform. The methods and instruments presented in the paper provide the discovery of relevant content based on users' focused interests within a certain domain. The hybrid approach comprising linguistic ...

Added: February 23, 2018

Algorithms and methods for solving scheduling problems and other extremum problems on large-scale graphs

Chernyshev S. V., Cherepanov E. A., Pankratiev E. V. et al., Journal of Mathematical Sciences 2005 Vol. 128 No. 6 P. 3487–3495

Added: January 27, 2014

Критерий MRMR и уменьшение размерности пространства признаков в задаче классификации спама поисковой системы

Belov A. V., karbachinsky I. O., Качество. Инновации. Образование 2014 № 6 С. 24–32

Today web spam is the one of the key problems of modern web search engines. In this paper we investigate the efficiency of various dimensionality reduction methods applying to the spam classifier of go.mail.ru search system. Effective utilization of such techniques can significantly increase the number of features and the quality of the classifier without ...

Added: February 2, 2015

Formalization of Medical Records Using an Ontology: Patient Complaints

Klyshinskiy E., Gribova V., Shakhgeldyan C. et al., , in: Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Kazan, Russia, July 17–19, 2019, Revised Selected Papers. Communications in Computer and Information ScienceVol. 1086. Springer, 2020. P. 143–153.

Added: October 26, 2019

Experimental IR Meets Multilinguality, Multimodality, and Interaction

Springer, 2017.

Added: November 9, 2018

Experimental IR Meets Multilinguality, Multimodality, and Interaction

Springer, 2020.

Added: October 4, 2020

Несчастливы по-своему: как измерить тональность литературного текста?

Sherstinova T., Moskvina A., Kirina M. et al., В кн.: Корпусная лингвистика - 2023. [б.и.], 2023.

In the experimental study, the results of three different approaches to the evaluation of the tonality of literary texts are compared: dictionary-based, machine learning, and distributional semantics. The material for analysis was a selection of 210 stories by Russian writers from the first three decades of the 20th century. The research showed that the correlation ...

Added: December 9, 2023

An Algorithm for Detecting Communities in Social Networks

Kolomeychenko M. I., Chepovskiy A.A., Chepovskiy A.M., Journal of Mathematical Sciences 2015 Vol. 211 No. 3 P. 310–318

In this paper we propose an algorithm for finding subgraphs with adjusted properties of large social networks. The description of computational experi-ment which confirms the effectiveness of the proposed algorithm is given. ...

Added: October 24, 2015

Архитектура и инструменты программного комплекса для визуализации и анализа графов

Борисов Т. Н., Коломейченко М. И., Polyakov I. V. et al., В кн.: SCVRT2013-14 Труды Международной научной конференции Международного центра по ядерной безопасности Института физико-технической информатики. Протвино: Изд-во ИФТИ, 2014. С. 32–37.

В данной работе представлено описание программного комплекса для хранения, анализа и визуализации графов социальных сетей. В статье проводится сравнительный анализ существующих программных продуктов для визуализации графов, приводится описание общей архитектуры приложения, а также описание специально разработанного графового хранилища. Помимо этого описывается существующая функциональность программного продукта. ...

Added: November 21, 2014

Evaluating Distributional Semantic Models with Russian Noun-Adjective Compositions

Panicheva P., Protopopova E., Bukia G. et al., , in: Analysis of Images, Social Networks and Texts. 5th International Conference, AIST 2016, Yekaterinburg, Russia, April 7-9, 2016, Revised Selected Papers. Communications in Computer and Information ScienceVol. 661. Switzerland: Springer, 2017. P. 236–247.

In the paper vector-space semantic models based on Word2Vec word embeddings algorithm and a count-based association-oriented algorithm are evaluated and compared by measuring association strength between Russian nouns and adjectives. A dataset of nouns and associated adjectives is used as the test set for pseudodisambiguation task. Models are trained with corpora of Russian fiction. A ...

Added: February 18, 2019

Experimental IR Meets Multilinguality, Multimodality, and Interaction: 12th International Conference of the CLEF Association, CLEF 2021, Virtual Event, September 21–24, 2021, Proceedings

Springer, 2021.

Added: September 28, 2021

Concept-based chatbot for interactive query refinement in product search

Goncharova E., Ilvovsky D., Galitsky B., , in: Proceedings of the 9th International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI 2021)Vol. 2972. CEUR-WS, 2021. P. 51–58.

Added: October 28, 2021

Representation of Different Types of Adjectival Polysemy in the Mental Lexicon

Apresyan V., Lopukhina A., Zarifyan M., Frontiers in Psychology 2021 Vol. 12 Article 742064

We studied mental representations of literal, metonymically different, and metaphorical senses in Russian adjectives. Previous studies suggested that in polysemous words, metonymic senses, being more sense-related, were stored together with literal senses, whereas more distant metaphorical senses had separate representations. We hypothesized that metonymy may be heterogeneous with respect to its mental storage. “Whole-part” metonymy ...

Added: October 29, 2021

Автоматическое размещение графа на основе метода физических аналогий

Коломейченко М. И., Polyakov I. V., Chepovskiy A., В кн.: Труды Международной научной конференции Московского физико-технического института (государственного университета) и Института физико-технической информатики (SCVRT1516). М., Протвино: Институт физико-технической информатики, 2016. С. 93–97.

This paper describes an automatic graph layout ”peacock’s tail”, which is based on force-directed graph drawing algorithm. Also presented its modified faster version called ”fast peacock’s tail.” This approach proved its efficiency on big social network graphs. ...

Added: November 20, 2016

Exploring Semantic Concreteness and Abstractness for Metaphor Identification and Beyond

Badryzlova Y., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 17 июня — 20 июня 2020 г.)Вып. 19(26). М.: Изд-во РГГУ, 2020. P. 33–47.

The paper presents a method for computing indexes of semantic concreteness and abstractness in two languages (Russian and English). These indexes are used in metaphor identification experiments in both languages; the results are either comparable to or surpass pervious work and the baselines. We analyze the obtained indexes of concreteness and abstractness to see how ...

Added: August 24, 2020

Using Annotated Suffix Trees for Fuzzy Full Text Search

Dmitry Frolov, , in: Communications in Computer and Information Science. Information Retrieval. 10th Russian Summer School, RuSSIR 2016, Saratov, Russia, August 22-26, 2016, Revised Selected Papers. Springer, 2016.

A method for fuzzy full text search is proposed. The method follows a popular two-stage scheme with a novel second stage: a prelim inary search stage using an n-gram inverted index and, at the second stage, relevance checking between the query and documents using fre quency annotated suffix trees (ASTs). The ASTs are built for all docu ments of the ...

Added: December 13, 2016

Data Analytics and Management in Data Intensive Domains. 23rd International Conference, DAMDID/RCDL 2021, Moscow, Russia, October 26–29, 2021, Revised Selected Papers

Springer, 2022.

“Data Analytics and Management in Data Intensive Domains” conference (DAMDID) is planned as a multidisciplinary forum of researchers and practitioners from various domains of science and research promoting cooperation and exchange of ideas in the area of data analysis and management in data intensive domains. Approaches to data analysis and management being developed in specific data intensive domains of X-informatics (such as X = astro, bio, chemo, geo, medicine, neuro, physics, ...

Added: August 30, 2021

Буферизация и сжатие данных при хранении мультиграфа

Polyakov I. V., Chepovskiy A., В кн.: Труды Международной научной конференции Московского физико-технического института (государственного университета) и Института физико-технической информатики (SCVRT1516). М., Протвино: Институт физико-технической информатики, 2016. С. 76–78.

In this paper we propose an approach for compact storaging of a certain type of graphs. We use preprocessing algorithms which can significantly increase the data density on the disc and reduce the required number of disk accesses required to perform fundamental operations with the graph. ...

Added: November 20, 2016

Texterra: инфраструктура для анализа текстов

Денис Турдаков, Астраханцев Н. А., Недумов Я. Р. et al., Труды Института системного программирования РАН 2014 Т. 26 С. 421–438

he paper presents a framework for fast text analytics developed during the Texterra project. Texterra is a technology for multilingual text mining based on novel text processing methods that exploit knowledge extracted from user-generated content. It delivers a fast scalable solution for text mining without the expensive customization. Depending on use-cases Texterra could be utilized ...

Added: November 6, 2017