• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Automatic Detection of Borrowings in Low-Resource Languages of the Caucasus: Andic branch
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 20, 2026
HSE University Opens First Representative Office of Satellite Laboratory in Brazil
HSE University-St Petersburg opened a representative office of the Satellite Laboratory on Social Entrepreneurship at the University of Campinas in Brazil. The platform is going to unite research and educational projects in the spheres of sustainable development, communications and social innovations.
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.
May 15, 2026
Preserving Rationality in a Period of Turbulence
The HSE International Laboratory for Logic, Linguistics and Formal Philosophy studies logic and rationality in a transformed world characterised by a diversity of logical systems and rational agents. The laboratory supports and develops academic ties with Russian and international partners. The HSE News Service spoke with the head of the laboratory, Prof. Elena Dragalina-Chernaya, about its work.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Automatic Detection of Borrowings in Low-Resource Languages of the Caucasus: Andic branch

P. 34–41.
Zaitsev K., Minchenko A.

Linguistic borrowings occur in all languages. Andic languages of the Caucasus have borrowings from different donor-languages like Russian, Arabic, Persian. To automatically detect these borrowings, we propose a logistic regression model. The model was trained on the dataset which contains words in IPA from dictionaries of Andic languages. To improve model’s quality, we compared TfIdf and Count vectorizers and chose the second one. Besides, we added new features to the model. They were extracted using analysis of vectorizer features and using a language model. The model was evaluated by classification quality metrics (precision, recall and F1-score). The best average F1-score of all languages for words in IPA was about 0.78. Experiments showed that our model reaches good results not only with words in IPA but also with words in Cyrillic.

Language: English
Text on another site
Keywords: field linguisticsNatural Language Processing (NLP)

In book

Proceedings of the first workshop on NLP applications to field linguistics
Gyeongju: International Conference on Computational Linguistics, 2022.
Similar publications
A textual fingerprint learning model to detect fake information spreaders in social networks
Behzadidoost R., Neurocomputing 2025 Vol. 665 P. 1–21
While earlier research has focused on detecting misinformation content, identifying the users who spread it, referred to in this paper as fake information spreaders, remains a relatively new challenge. These users deliberately mix true and false information, making detection more difficult. This paper proposes a textual fingerprint learning model to detect fake information spreaders. The ...
Added: March 12, 2026
SynEL: A synthetic benchmark for entity linking
Karpov I., Kirillovich A., Goncharova E. et al., Plos One 2026 Vol. 21 No. 1 Article e0339468
Large language models (LLMs) offer significant potential for constructing commonsense knowledge graphs from text, demonstrating adaptability across diverse domains. However, their effectiveness varies significantly with domain-specific language, highlighting a critical need for specialized benchmarks to assess and optimize knowledge graph construction sub-tasks like named entity recognition, relation extraction, and entity linking. Currently, domain-specific benchmarks are ...
Added: January 15, 2026
Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence
Washington, United States of America: AAAI Press, 2025.
AAAI-25 Technical Tracks 23 (Natural Language Processing II) collects peer-reviewed research papers that advance the state of natural language processing, with an emphasis on large language models, efficient inference, instruction following, retrieval augmentation, and multimodal language understanding. The papers address both theoretical and practical challenges, including model efficiency, interactive generation, grounding in external knowledge and ...
Added: December 18, 2025
Lacuna Inc. at SemEval-2025 Task 4: LoRA-Enhanced Influence-Based Unlearning for LLMs
Kudelya A., Shirnin A., , in: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025).: Association for Computational Linguistics, 2025. P. 1528–1533.
This paper describes LIBU (LoRA enhanced influence-based unlearning), an algorithm to solve the task of unlearning - removing specific knowledge from a large language model without retraining from scratch and compromising its overall utility (SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models). The algorithm combines classical influence functions to remove the influence of ...
Added: November 17, 2025
Empaths at SemEval-2025 Task 11: Retrieval-Augmented Approach to Perceived Emotions Prediction
Morozov L., Mogilevskii A., Shirnin A., , in: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025).: Association for Computational Linguistics, 2025. P. 2000–2007.
This paper describes LIBU (LoRA enhanced influence-based unlearning), an algorithm to solve the task of unlearning - removing specific knowledge from a large language model without retraining from scratch and compromising its overall utility (SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models). The algorithm combines classical influence functions to remove the influence of ...
Added: November 17, 2025
Findings of the Association for Computational Linguistics: EMNLP 2025
Association for Computational Linguistics, 2025.
The book contains this year’s edition of the Conference on Empirical Methods in Natural Language Processing! Importantly, it marks the 30th edition of EMNLP. With over 8,000 submissions, more than 3,000 accepted papers, and thousands of attendees, we have come a long way from that first workshop, which had 14 accepted papers. As the field ...
Added: November 16, 2025
Findings of the Association for Computational Linguistics: NAACL 2025
Association for Computational Linguistics, 2025.
Added: November 6, 2025
Explainable Document Classification via Concept Whitening and Stable Graph Patterns
Parakal E. G., Kuznetsov S., Makarov I. et al., IEEE Access 2025 Vol. 13 P. 149657–149678
This paper proposes a novel explainable document classification framework that integrates Concept Whitening (CW) with graph concepts that are derived from stable graph patterns, and extracted via methods based on Formal Concept Analysis (FCA) and pattern structures. Document graphs are constructed using Abstract Meaning Representation (AMR) graphs, from which graph concepts are extracted and aligned ...
Added: October 22, 2025
Building a Clean Bartangi Language Corpus and Training Word Embeddings for Low-Resource Language Modeling
Shumen: INCOMA Ltd, 2025.
This paper introduces a rule-based lemmatization and word embedding pipeline for the endangered Bartangi language, part of the Pamiri language group. The system combines a manually constructed lemma dictionary with morphological suffix rules to improve linguistic consistency in low-resource settings. The results demonstrate enhanced lemmatization accuracy and higher-quality embeddings for downstream NLP tasks. The work ...
Added: October 20, 2025
Исследования благополучия с помощью передовых методов обработки естественного языка (NLP): перспективы и ограничения
Voevodina E., Современная зарубежная психология 2025 Т. 14 № 3 С. 172–181
Context and relevance. Well-being research faces methodological limitations of conventional psychometric measures, criticized for poor ecological validity, limited information yield, and inadequate capture of multidimensional construct of well-being. Advanced natural language processing (NLP) technologies offer solutions to these constraints. Objective. To evaluate opportunities and challenges of transformer-based NLP for well-being research. Methods and materials. We conducted an analytical review of ...
Added: October 9, 2025
CLEF 2025 Working Notes
CEUR Workshop Proceedings, 2025.
Added: October 6, 2025
Разработка архитектуры классификатора для оценки состояния объектов инфраструктуры с применением нейронных сетей
Moiseev N., Абрамов И. А., Камакин А. Ю., В кн.: Параллельные вычислительные технологии – XIX всероссийская конференция с международным участием, ПаВТ'2025, г. Москва, 8–10 апреля 2025 г. Короткие статьи и описания плакатов.: Челябинск: Издательский центр ЮУрГУ, 2025. С. 301–301.
In recent years, with the advancement of deep learning and neural network methods, their application in geospatial analysis tasks has become particularly relevant. A key challenge in this field is assessing the state of urban infrastructure, including the classification of buildings by their functional purpose (residential, commercial, governmental, industrial). The use of neural networks significantly ...
Added: September 17, 2025
Comparative Analysis of Encoder-Based NER and Large Language Models for Skill Extraction from Russian Job Vacancies
Matkin N., Smirnov A., Usanin M. et al., , in: 12th International Conference, AIST 2024, Bishkek, Kyrgyzstan, October 17–19, 2024, Revised Selected Papers.: Cham: Springer, 2025.
The labor market is undergoing rapid changes, with increasing demands on job seekers and a surge in job openings. Identifying essential skills and competencies from job descriptions is challenging due to varying employer requirements and the omission of key skills. This study addresses these challenges by comparing traditional Named Entity Recognition (NER) methods based on ...
Added: July 26, 2025
2024 6th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA)
NY: IEEE, 2024.
The 6th International Conference on Control Systems Mathematical Modeling, Automation and Energy Efficiency (SUMMA2024), which is  traditionally jointly organized by V.A. Trapeznikon Institute of Control Sciences RAS and the Institute of Computer Sciences of Lipetsk State Technical University was held in Lipetsk on November, 13-15 2024. SUMMA2024 program includes topics of interest that consist of, ...
Added: April 5, 2025
The relationships between RedditSI and BTC exchange characteristics: Do Reddit users still control the market?
Baklanova V., Eurasian Economic Review 2025 Vol. 15 P. 285–306
This study investigates the influence of Reddit community on Bitcoin market performance by introducing the Reddit Sentiment Index (RedditSI) as a tool to measure sentiment among Reddit users. The index was crafted based on the Bitcoin-related subreddits and classified with the Flair NLP model. Statistical analysis, including correlation, cointegration and causality tests, revealed significant relationships ...
Added: March 14, 2025
The Church Militant: A Modern Western Aramaic Account
Phillip Yu. Burlakov, Cherkashina A., Häberl C. et al., , in: Interconnected Traditions: Semitic Languages, Literatures, Cultures—A Festschrift for Geoffrey Khan. Volume 2. The Medieval World, Judaeo-Arabic, and Neo-AramaicVol. 2: The Medieval World, Judaeo-Arabic, and Neo-Aramaic.: Cambridge: University of Cambridge, 2025. Ch. 23 P. 667–692.
The article presents a recorded conversation in Modern Western Aramaic (MWA), known as Siryōn, collected during a fieldwork expedition in Maaloula, Syria, in 2021. The discussion provides linguistic and sociocultural insights, capturing the experiences of two elderly speakers, including one who recounts his uncle’s role as a bishop during political tensions involving Bishop Hilarion Capucci. ...
Added: March 10, 2025
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2024
Bangkok: Association for Computational Linguistics, 2024.
Originally named the Association for Machine Translation and Computational Linguistics (AMTCL), the Association for Computational Linguistics was founded in 1962 and renamed the ACL in 1968. The ACL is run by some 20 volunteers overseeing the administration of the Association (organising elections, deciding on new actions, adapting to the fast changing trends of our fields), ...
Added: February 21, 2025
Findings of the Association for Computational Linguistics: EACL 2024
Association for Computational Linguistics, 2024.
The 18th Conference of the European Chapter of the Association for Computational Linguistics. EACL is the flagship European conference dedicated to European and international researchers, covering a wide spectrum of research in Computational Linguistics and Natural Language Processing. ...
Added: February 17, 2025
Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques
Gorshkov S., Ignatov D. I., Chernysheva A. et al., IEEE Access 2025 Vol. 13 P. 962–979
Identifying potentially high-performing students is crucial for universities aiming to enhance educational outcomes, for companies seeking to recruit top talents early, and for advertising platforms looking to optimize targeted marketing. This paper introduces an algorithm designed to identify students with exceptional academic performance by analyzing their subscriptions to communities on the social network VKontakte. The ...
Added: January 3, 2025
Элицитация: о несерьёзном
Lander Y., Журнал ОПЛинга 2024 № 2 Статья 3
The squib provokes to think about what the methodology of elicitation actually gives us in grammar description. ...
Added: December 29, 2024
Spot the Bot: Coarse-Grained Partition of Semantic Paths for Bots and Humans
Gromov V., Kogan A., , in: 10th International Conference, PReMI 2023, Kolkata, India, December 12–15, 2023, Proceedings. Pattern Recognition and Machine Intelligence. LNCS, volume 14301.: Cham: Springer, 2023. P. 348–355.
Nowadays, technology is rapidly advancing: bots are writing comments, articles, and reviews. Due to this fact, it is crucial to know if the text was written by a human or by a bot. This paper focuses on comparing structures of the coarse-grained partitions of semantic paths for human-written and bot-generated texts. We compare the clusterizations ...
Added: December 13, 2024
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit