• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Semantic Recommendation System for Bilingual Corpus of Academic Papers
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 22, 2026
HSE Graduates AI Project Wins at TECH & AI Awards
Daria Davydova, graduate of the HSE Graduate School of Business and Head of the AI Implementation Unit at the Artificial Intelligence Department of Alfa-Bank, received a prize at the TECH & AI Awards. She was awarded for the best AI solution for optimising business processes. The winners were determined as part of the VII Russian Summit and Awards on Digital Transformation (CDO/CDTO Summit & Awards).
May 20, 2026
HSE University Opens First Representative Office of Satellite Laboratory in Brazil
HSE University-St Petersburg opened a representative office of the Satellite Laboratory on Social Entrepreneurship at the University of Campinas in Brazil. The platform is going to unite research and educational projects in the spheres of sustainable development, communications and social innovations.
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Semantic Recommendation System for Bilingual Corpus of Academic Papers

Ch. 3. P. 22–36.
Safaryan A., Petr Filchenkov, Yan W., Kutuzov A. B., Irina Nikishina

We tested four methods of making document representations cross-lingual for the task of semantic search for the similar papers based on the corpus of papers from three Russian conferences on NLP: Dialogue, AIST and AINL. The pipeline consisted of three stages: preprocessing, word-by-word vectorisation using models obtained with various methods to map vectors from two independent vector spaces to a common one, and search for the most similar papers based on the cosine similarity of text vectors. The four methods used can be grouped into two approaches: 1) aligning two pretrained monolingual word embedding models with a bilingual dictionary on our own (for example, with the VecMap algorithm) and 2) using pre-aligned cross-lingual word embedding models (MUSE). To find out, which approach brings more benefit to the task, we conducted a manual evaluation of the results and calculated the average precision of recommendations for all the methods mentioned above. MUSE turned out to have the highest search relevance, but the other methods produced more recommendations in a language other than the one of the target paper.

Language: English
DOI
Text on another site
Keywords: семантический поискsemantic searchсемантическая близостьпоиск научной литературыsemantic similarityкросс-языковые моделиCross-lingual representationsDocument representationsScientific literature search

In book

Recent Trends in Analysis of Images, Social Networks and Texts. 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020 Revised Supplementary Proceedings
Vol. 12602. , Springer, 2021.
Similar publications
Aschern at CheckThat! 2021: Lambda-Calculus of Fact-Checked Claims
Chernyavskiy A., Ilvovsky D., Nakov P., , in: CLEF 2021 Working Notes.: CEUR Workshop Proceedings, 2021. P. 484–493.
We describe our system for the CLEF 2021 CheckThat! Lab Task 2 Subtask A on detecting previously fact-checked claims. We developed a pipeline using TF.IDF, sentence-BERT fine-tuned on the training data, and reranking using LambdaMART and the predicted similarity scores and positions in the ranked list as features. We examined the quality of each model ...
Added: May 9, 2024
Use of Text Skeleton Structures for the Development of Semantic Search Methods
A. V. Mylnikova, V. A. Trusov, L. A. Mylnikov, Automatic Documentation and Mathematical Linguistics 2023 Vol. 57 No. 5 P. 301–307
This paper considers the problem of the generation of descriptors to reduce data volumes, text data resources, and search times through the use of the new factors of authorship, region, emotive meaning, and popularity, as well as a text category without special marks that can be used to generate descriptors. This approach allows the use ...
Added: February 29, 2024
Проект Chekhov Digital: задачи и проблемы реализации семантической разметки текстов (на примере рассказа А. П. Чехова «Смерть чиновника»)
Северина Е. М., Ларионова М. Ч., Litera 2023 № 10 С. 211–222
The article considers a model of preparation of machine-readable (semantic) markup of texts for the Chekhov Digital project on the example of philological interpretation of individual significant elements of A. P. Chekhov's story "Death of an Official" and presentation of this information explicitly based on the standards of digital publication Text Encoding Initiative (TEI/XML). Based ...
Added: January 12, 2024
Проект Chekhov Digital: разработка цифрового индекса для семантического поиска
Северина Е. М., В кн.: Kompyuter lingvistikasi: muammo va yechimlar (Компьютерная лингвистика: проблемы и решения, Computational linguistics and solutions).: Tashkent: [б.и.], 2021. С. 82–88.
Рассмотрена специфика разработки цифрового указателя (индекса) имен и названий реальных людей и объектов, упоминаемых в текстах произведений и писем А. П. Чехова и представленных в указателях академического издания. Разработка такого индекса позволяет организовать семантический поиск по текстам произведений писателя, редакционно-критическому аппарату цифрового издания Chekhov Digital. ...
Added: November 8, 2023
Использование скелетных структур текстов для развития методов семантического поиска
Mylnikova A., Trusov V., Mylnikov L., Научно-техническая информация. Серия 2: Информационные процессы и системы 2023 № 10 С. 16–23
This paper considers the problem of the generation of descriptors to reduce data volumes, text data resources, and search times through the use of the new factors of authorship, region, emotive meaning, and popularity, as well as a text category without special marks that can be used to generate descriptors. This approach allows the use of unique lexical-grammatical ...
Added: October 31, 2023
Moving Other Way: Exploring Word Mover Distance Extensions
Smirnov, I., Yamshchikov I. P., , in: COMPLEXIS 2022. Proceedings of the 7th International Conference on Complexity, Future Information Systems and Risk. April 23-24, 2022.: Science and Technology Publications, Lda, 2022. P. 92–97.
Added: September 8, 2022
Rethinking Crowd Sourcing for Semantic Similarity
Solomon S., Cohn A., Rosenblum H. et al., / Series Computer Science "arxiv.org". 2021.
Estimation of semantic similarity is crucial for a variety of natural language processing (NLP) tasks. In the absence of a general theory of semantic information, many papers rely on human annotators as the source of ground truth for semantic similarity estimation. This paper investigates the ambiguities inherent in crowd-sourced semantic labeling. It shows that annotators ...
Added: December 3, 2021
Lexicographic Study of Synonymy: Clarifying Semantic Similarity between Words
Solovyev V., Гималетдинова Г., Халитова Л. et al., Computacion y Sistemas 2021 Vol. 25 No. 3 P. 667–675
The problem of determining semantic similarity between words affects the understanding of synonymy 13 and creates obstacles to the work of lexicographers. The study was carried out as a part of a larger 14 research project on expert assessment of synonymic rows in RuWordNet thesaurus (a WordNet–like 15 thesaurus for the Russian language). The aim ...
Added: December 1, 2021
Style-transfer and Paraphrase: Looking for a Sensible Semantic Similarity Metric
Yamshchikov I. P., Shibaev V., Khlebnikov N. et al., , in: The Thirty-Fifth AAAI Conference on Artificial Intelligence. Technical Tracks 16Vol. 35. Issue 16.: AAAI Press, 2021. P. 14213–14220.
The rapid development of such natural language processing tasks as style transfer, paraphrase, and machine translation often calls for the use of semantic similarity metrics. In recent years a lot of methods to measure the semantic similarity of two short texts were developed. This paper provides a comprehensive analysis for more than a dozen of ...
Added: July 22, 2021
О методе комплексного семантического, статистического и психолингвистического анализа многозначности
Apresyan V., Апресян Ю. Д., Dragoy O. et al., Русская речь 2019 № 1 С. 8–17
Цель проекта — мультидисциплинарное изучение феномена полисемии (многозначности) языковых единиц с помощью теоретических, экспериментальных и статистических методов. Хотя полисемии посвящено большое количество работ, это явление ранее не исследовалось комплексно. В результате исследования, которое сочетало элементы словарного описания, статистического анализа, опросов, а также изучение электроэнцефалограмм и движений глаз, удалось установить следующее: при развитии полисемии используется большее количество различных семантических сдвигов, помимо ...
Added: April 8, 2019
СЕМАНТИЧЕСКАЯ ОБРАБОТКА НЕСТРУКТУРИРОВАННЫХ ТЕКСТОВЫХ ДАННЫХ НА ОСНОВЕ ЛИНГВИСТИЧЕСКОГО ПРОЦЕССОРА PULLENTI
Козеренко Е. Б., Кузнецов К. И., Romanov D. A., Информатика и ее применения 2018 Т. 12 № 3 С. 91–98
The paper presents the method for creation of knowledge extraction systems based on the approach employing the software tool system PullEnti comprising the algorithms for morphological and semantic-syntactical analysis which makes it possible to extract entities of certain types from natural language texts (persons, organizations, locations, and other target semantic objects). The PullEnti system uses ...
Added: December 19, 2018
Semantic Proximity Establishment in the Tasks of Knowledge Extraction and Named Entities Recognition
Kozerenko E. B., Kuznetsov K. I., Morozova Y. I. et al., , in: PROCEEDINGS OFTHE 2017 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE.: American Council on Science & Education, 2017. P. 339–344.
The paper deals with the problem of establishing text segments containing the similar semantic units for the tasks of analytical text processing within the semantic technology platform. The methods and instruments presented in the paper provide the discovery of relevant content based on users' focused interests within a certain domain. The hybrid approach comprising linguistic ...
Added: February 23, 2018
Extracting social networks from literary text with word embedding tools
Wohlgenannt G., Artemova E., Ilvovsky D., , in: Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH).: Osaka: [б.и.], 2016. Ch. 4 P. 18–26.
In this paper a social network is extracted from a literary text. The social network shows, how frequent the characters interact and how similar their social behavior is. Two types of similarity measures are used: the first applies co-occurrence statistics, while the second exploits cosine similarity on different types of word embedding vectors. The results ...
Added: March 6, 2017
Trend Monitoring for Linking Science and Strategy
Bakhtin P. D., Saritas O., Chulok A. et al., Scientometrics 2017 Vol. 111 No. 3 P. 2059–2075
Rapid changes in Science & Technology (S&T) along with breakthroughs in products and services concern a great deal of policy and strategy makers and lead to an ever increasing number of Foresight and other types of forward-looking work. At the outset, the purpose of these efforts is to investigate emerging S&T areas, set priorities and ...
Added: December 21, 2016
Welcome to the club: Designing the inventory of semantic roles for adjectives
Lyashevskaya O., Kashkin E., Компьютерная лингвистика и интеллектуальные технологии 2016 No. 15 P. 440–454
The argument constructions of adjectives has largely been out of the scope of research on semantic roles both in theoretical and IT fields. Before adding the roles of adjectival arguments to the network of semantic roles it is important to determine whether the adjectival roles form a separate list or whether they can be seen ...
Added: December 14, 2016
Improving Distributional Semantic Models Using Anaphora Resolution during Linguistic Preprocessing
Kutuzov A. B., Козлова О. С., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва,1–4 июля 2016 г.)Вып. 15.: М.: Изд-во РГГУ, 2016. P. 288–300.
In natural language processing, distributional semantic models are known as an efficient data driven approach to word and text representation, which allows computing meaning directly from large text corpora into word embeddings in a vector space. This paper addresses the role of linguistic preprocessing in enhancing performance of distributional models, and particularly studies pronominal anaphora ...
Added: November 12, 2016
Метод семантичского поиска специалистов с определенным набором компетенций
Zakhlebin I. V., В кн.: Электронный бизнес. Управление интернет-проектами. Инновации: Сборник трудов участников студенческой научно-практической конференции, Москва, 12-14 марта 2013 г.: М.: НИУ ВШЭ, 2014. С. 88–91.
The report deals with the methodology of building a system to perform search for specialists satisfying a defined set of competencies. The proposed search method is based on natural language texts analysis. ...
Added: July 11, 2015
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit