Automatic Detection of Borrowings in Low-Resource Languages of the Caucasus: Andic branch

?

Automatic Detection of Borrowings in Low-Resource Languages of the Caucasus: Andic branch

P. 34–41.

Zaitsev K., Minchenko A.

Linguistic borrowings occur in all languages. Andic languages of the Caucasus have borrowings from different donor-languages like Russian, Arabic, Persian. To automatically detect these borrowings, we propose a logistic regression model. The model was trained on the dataset which contains words in IPA from dictionaries of Andic languages. To improve model’s quality, we compared TfIdf and Count vectorizers and chose the second one. Besides, we added new features to the model. They were extracted using analysis of vectorizer features and using a language model. The model was evaluated by classification quality metrics (precision, recall and F1-score). The best average F1-score of all languages for words in IPA was about 0.78. Experiments showed that our model reaches good results not only with words in IPA but also with words in Cyrillic.

Language: English

Text on another site

In book

Proceedings of the first workshop on NLP applications to field linguistics

Gyeongju: International Conference on Computational Linguistics, 2022.

A textual fingerprint learning model to detect fake information spreaders in social networks

Behzadidoost R., Neurocomputing 2025 Vol. 665 P. 1–21

While earlier research has focused on detecting misinformation content, identifying the users who spread it, referred to in this paper as fake information spreaders, remains a relatively new challenge. These users deliberately mix true and false information, making detection more difficult. This paper proposes a textual fingerprint learning model to detect fake information spreaders. The ...

Added: March 12, 2026

SynEL: A synthetic benchmark for entity linking

Karpov I., Kirillovich A., Goncharova E. et al., Plos One 2026 Vol. 21 No. 1 Article e0339468

Large language models (LLMs) offer significant potential for constructing commonsense knowledge graphs from text, demonstrating adaptability across diverse domains. However, their effectiveness varies significantly with domain-specific language, highlighting a critical need for specialized benchmarks to assess and optimize knowledge graph construction sub-tasks like named entity recognition, relation extraction, and entity linking. Currently, domain-specific benchmarks are ...

Added: January 15, 2026

Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence

Washington, United States of America: AAAI Press, 2025.

AAAI-25 Technical Tracks 23 (Natural Language Processing II) collects peer-reviewed research papers that advance the state of natural language processing, with an emphasis on large language models, efficient inference, instruction following, retrieval augmentation, and multimodal language understanding. The papers address both theoretical and practical challenges, including model efficiency, interactive generation, grounding in external knowledge and ...

Added: December 18, 2025

Lacuna Inc. at SemEval-2025 Task 4: LoRA-Enhanced Influence-Based Unlearning for LLMs

Kudelya A., Shirnin A., , in: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025).: Association for Computational Linguistics, 2025. P. 1528–1533.

This paper describes LIBU (LoRA enhanced influence-based unlearning), an algorithm to solve the task of unlearning - removing specific knowledge from a large language model without retraining from scratch and compromising its overall utility (SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models). The algorithm combines classical influence functions to remove the influence of ...

Added: November 17, 2025

Empaths at SemEval-2025 Task 11: Retrieval-Augmented Approach to Perceived Emotions Prediction

Morozov L., Mogilevskii A., Shirnin A., , in: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025).: Association for Computational Linguistics, 2025. P. 2000–2007.

Added: November 17, 2025

Findings of the Association for Computational Linguistics: EMNLP 2025

Association for Computational Linguistics, 2025.

The book contains this year’s edition of the Conference on Empirical Methods in Natural Language Processing! Importantly, it marks the 30th edition of EMNLP. With over 8,000 submissions, more than 3,000 accepted papers, and thousands of attendees, we have come a long way from that first workshop, which had 14 accepted papers. As the field ...

Added: November 16, 2025

Findings of the Association for Computational Linguistics: NAACL 2025

Association for Computational Linguistics, 2025.

Added: November 6, 2025

Explainable Document Classification via Concept Whitening and Stable Graph Patterns

Parakal E. G., Kuznetsov S., Makarov I. et al., IEEE Access 2025 Vol. 13 P. 149657–149678

This paper proposes a novel explainable document classification framework that integrates Concept Whitening (CW) with graph concepts that are derived from stable graph patterns, and extracted via methods based on Formal Concept Analysis (FCA) and pattern structures. Document graphs are constructed using Abstract Meaning Representation (AMR) graphs, from which graph concepts are extracted and aligned ...

Added: October 22, 2025

Building a Clean Bartangi Language Corpus and Training Word Embeddings for Low-Resource Language Modeling

Shumen: INCOMA Ltd, 2025.

This paper introduces a rule-based lemmatization and word embedding pipeline for the endangered Bartangi language, part of the Pamiri language group. The system combines a manually constructed lemma dictionary with morphological suffix rules to improve linguistic consistency in low-resource settings. The results demonstrate enhanced lemmatization accuracy and higher-quality embeddings for downstream NLP tasks. The work ...

Added: October 20, 2025

Исследования благополучия с помощью передовых методов обработки естественного языка (NLP): перспективы и ограничения

Voevodina E., Современная зарубежная психология 2025 Т. 14 № 3 С. 172–181

Context and relevance. Well-being research faces methodological limitations of conventional psychometric measures, criticized for poor ecological validity, limited information yield, and inadequate capture of multidimensional construct of well-being. Advanced natural language processing (NLP) technologies offer solutions to these constraints. Objective. To evaluate opportunities and challenges of transformer-based NLP for well-being research. Methods and materials. We conducted an analytical review of ...

Added: October 9, 2025

CLEF 2025 Working Notes

CEUR Workshop Proceedings, 2025.

Added: October 6, 2025

Разработка архитектуры классификатора для оценки состояния объектов инфраструктуры с применением нейронных сетей

Moiseev N., Абрамов И. А., Камакин А. Ю., В кн.: Параллельные вычислительные технологии – XIX всероссийская конференция с международным участием, ПаВТ'2025, г. Москва, 8–10 апреля 2025 г. Короткие статьи и описания плакатов.: Челябинск: Издательский центр ЮУрГУ, 2025. С. 301–301.

In recent years, with the advancement of deep learning and neural network methods, their application in geospatial analysis tasks has become particularly relevant. A key challenge in this field is assessing the state of urban infrastructure, including the classification of buildings by their functional purpose (residential, commercial, governmental, industrial). The use of neural networks significantly ...

Added: September 17, 2025

Comparative Analysis of Encoder-Based NER and Large Language Models for Skill Extraction from Russian Job Vacancies

Matkin N., Smirnov A., Usanin M. et al., , in: 12th International Conference, AIST 2024, Bishkek, Kyrgyzstan, October 17–19, 2024, Revised Selected Papers.: Cham: Springer, 2025.

The labor market is undergoing rapid changes, with increasing demands on job seekers and a surge in job openings. Identifying essential skills and competencies from job descriptions is challenging due to varying employer requirements and the omission of key skills. This study addresses these challenges by comparing traditional Named Entity Recognition (NER) methods based on ...

Added: July 26, 2025

2024 6th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA)

NY: IEEE, 2024.

The 6th International Conference on Control Systems Mathematical Modeling, Automation and Energy Efficiency (SUMMA2024), which is traditionally jointly organized by V.A. Trapeznikon Institute of Control Sciences RAS and the Institute of Computer Sciences of Lipetsk State Technical University was held in Lipetsk on November, 13-15 2024. SUMMA2024 program includes topics of interest that consist of, ...

Added: April 5, 2025

The relationships between RedditSI and BTC exchange characteristics: Do Reddit users still control the market?

Baklanova V., Eurasian Economic Review 2025 Vol. 15 P. 285–306

This study investigates the influence of Reddit community on Bitcoin market performance by introducing the Reddit Sentiment Index (RedditSI) as a tool to measure sentiment among Reddit users. The index was crafted based on the Bitcoin-related subreddits and classified with the Flair NLP model. Statistical analysis, including correlation, cointegration and causality tests, revealed significant relationships ...

Added: March 14, 2025

The Church Militant: A Modern Western Aramaic Account

Phillip Yu. Burlakov, Cherkashina A., Häberl C. et al., , in: Interconnected Traditions: Semitic Languages, Literatures, Cultures—A Festschrift for Geoffrey Khan. Volume 2. The Medieval World, Judaeo-Arabic, and Neo-AramaicVol. 2: The Medieval World, Judaeo-Arabic, and Neo-Aramaic.: Cambridge: University of Cambridge, 2025. Ch. 23 P. 667–692.

The article presents a recorded conversation in Modern Western Aramaic (MWA), known as Siryōn, collected during a fieldwork expedition in Maaloula, Syria, in 2021. The discussion provides linguistic and sociocultural insights, capturing the experiences of two elderly speakers, including one who recounts his uncle’s role as a bishop during political tensions involving Bishop Hilarion Capucci. ...

Added: March 10, 2025

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2024

Bangkok: Association for Computational Linguistics, 2024.

Originally named the Association for Machine Translation and Computational Linguistics (AMTCL), the Association for Computational Linguistics was founded in 1962 and renamed the ACL in 1968. The ACL is run by some 20 volunteers overseeing the administration of the Association (organising elections, deciding on new actions, adapting to the fast changing trends of our fields), ...

Added: February 21, 2025

Findings of the Association for Computational Linguistics: EACL 2024

Association for Computational Linguistics, 2024.

The 18th Conference of the European Chapter of the Association for Computational Linguistics. EACL is the flagship European conference dedicated to European and international researchers, covering a wide spectrum of research in Computational Linguistics and Natural Language Processing. ...

Added: February 17, 2025

Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques

Gorshkov S., Ignatov D. I., Chernysheva A. et al., IEEE Access 2025 Vol. 13 P. 962–979

Identifying potentially high-performing students is crucial for universities aiming to enhance educational outcomes, for companies seeking to recruit top talents early, and for advertising platforms looking to optimize targeted marketing. This paper introduces an algorithm designed to identify students with exceptional academic performance by analyzing their subscriptions to communities on the social network VKontakte. The ...

Added: January 3, 2025

Элицитация: о несерьёзном

Lander Y., Журнал ОПЛинга 2024 № 2 Статья 3

The squib provokes to think about what the methodology of elicitation actually gives us in grammar description. ...

Added: December 29, 2024

Spot the Bot: Coarse-Grained Partition of Semantic Paths for Bots and Humans

Gromov V., Kogan A., , in: 10th International Conference, PReMI 2023, Kolkata, India, December 12–15, 2023, Proceedings. Pattern Recognition and Machine Intelligence. LNCS, volume 14301.: Cham: Springer, 2023. P. 348–355.

Nowadays, technology is rapidly advancing: bots are writing comments, articles, and reviews. Due to this fact, it is crucial to know if the text was written by a human or by a bot. This paper focuses on comparing structures of the coarse-grained partitions of semantic paths for human-written and bot-generated texts. We compare the clusterizations ...

Added: December 13, 2024