SynEL: A synthetic benchmark for entity linking

I. Karpov; Kirillovich A.; E. Goncharova; A. Parinov; A. Chernyavskiy; D. Ilvovsky; Semenova N.; Sosedka A.; Lisitsyna E.; Belkin M.

doi:10.1371/journal.pone.0339468

?

SynEL: A synthetic benchmark for entity linking

Plos One. 2026. Vol. 21. No. 1. Article e0339468.

Karpov I., Kirillovich A., Goncharova E., Parinov A., Chernyavskiy A., Ilvovsky D., Semenova N., Sosedka A., Lisitsyna E., Belkin M.

Large language models (LLMs) offer significant potential for constructing commonsense knowledge graphs from text, demonstrating adaptability across diverse domains. However, their effectiveness varies significantly with domain-specific language, highlighting a critical need for specialized benchmarks to assess and optimize knowledge graph construction sub-tasks like named entity recognition, relation extraction, and entity linking. Currently, domain-specific benchmarks are scarce. To address this gap, we introduce SynEL, a novel benchmark developed for evaluating text-based knowledge extraction methods, validated using customer support dialogues. We present a comprehensive methodology for benchmark construction, propose two distinct approaches for generating synthetic datasets, and evaluate accumulated hallucinations. Our experiments reveal that existing LLMs experience a significant performance drop, with micro-F1 scores decreasing by up to 25 absolute points when extracting low-resource entities compared to high-resource entities from sources like Wikipedia. Furthermore, by incorporating synthetic datasets into the training process, we achieved an improvement in micro-F1 scores of up to 10 absolute points. We publicly release our benchmark and generation code to demonstrate its utility for fine-tuning and evaluating LLMs.

Research target: Computer Science

Keywords: Natural Language Processing (NLP)Natural Language Processing (NLP)

ML-based Fast Simulation of FARICH Responses

Shipilov F., Barnyakov A., Ivanov A. et al., / Series Physics "arxiv.org". 2026.

A fast simulation of the detector response is a vital task in high-energy physics (HEP). Traditional Monte-Carlo methods form the backbone of modern particle physics simulation software but are computationally expensive. We present a machine-learning-based approach to fast simulation of the Focusing Aerogel Ring Imaging Cherenkov (FARICH) detector response. Given a particle track and momentum, ...

Added: May 19, 2026

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Association for Computational Linguistics, 2026.

Added: May 19, 2026

Dataset of solubility values for organic compounds in binary mixtures of solvents at various temperatures

Bezzubov S., Malikov D., Krasnov L. et al., Scientific data 2026 Vol. 13 Article 727

Solubility is a crucial property of organic compounds, impacting their potential applications in synthetic chemistry, materials science and drug design. Moreover, in technological processes mixtures of solvents are often utilized, making the solubility assessment more complicated. Predicting solubility values in mixtures of solvents from a molecular structure can help to address this issue, although a ...

Added: May 19, 2026

Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2

Pikalov V., Meshcheryakov V., Kondratev S. et al., Technologies 2026 Vol. 14 No. 1 P. 1–27

This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in scenarios where traditional remote controllers are impractical or unavailable. The architecture comprises ...

Added: May 19, 2026

Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2

Кондратьев С., Никитин Г. Э., Дырченкова Ю. А. et al., Technologies 2026 Vol. 14 No. 1 P. 1–27

Added: May 19, 2026

Parallel Computational Technologies. PCT 2025

Springer, 2025.

This book constitutes the refereed proceedings of the 19th International Conference on Parallel Computational Technologies, PCT 2025, held in Moscow, Russia, during April 8–10, 2025. The 31 full papers included in this volume were carefully reviewed and selected from 122 submissions. These papers were organized under the following topical sections: High Performance Architectures, Tools and Technologies; ...

Added: May 18, 2026

KMHCR: A Key-Controlled Signal-Domain Transformation for 5G IoT Security

Ronglin Z., Wei L., Jiahong C. et al., Journal of Signal Processing Systems 2026 Vol. 98 P. 1–15

To address the need for lightweight and low-latency protection in massive resource-constrained 5G Internet of Things (IoT) systems, this paper proposes Key-Controlled Modulation Hopping and Constellation Rotation (KMHCR). KMHCR is designed as a physical-layer confidentiality-enhancement mechanism that avoids bit-wise full-payload encryption in the protection pipeline. It uses a shared key derived from channel-reciprocity secret key ...

Added: May 16, 2026

DPN Verifier: A Toolkit for Faster Soundness Verification and Repair of Process Models with Data

Suvorov N. M., Proceedings of the Institute for System Programming of the RAS 2026 Vol. 38 No. 3(2) P. 49–66

Data Petri Nets (DPNs) extend classical Petri nets to model processes where data directly influences control-flow, enabling a comprehensive view of system behavior and possibility to detect failure points that could otherwise be hidden. Soundness is a correctness criterion that captures such failure points as deadlocks and livelocks as well as model boundedness and absence ...

Added: May 16, 2026

QGKM: A Quantum Fidelity-Based Graph Clustering Framework for Robust Data Pattern Recognition in Education Social Networks QGKM: A Quantum Fidelity-Based Graph Clustering Framework for Robust Data Pattern Recognition in Education Social Networks

Neal N. X., Weiqing L., Dacheng H. et al., Algorithms 2026 Vol. 19 No. 5 P. 1–22

In the era of data-driven education, educational social networks generate large volumes of high-dimensional and complex-structured data through learner interactions, collaborative activities, and resource-sharing behaviors, posing significant challenges to traditional unsupervised learning methods. Such data often exhibit non-convex distributions, heterogeneity, and noise sensitivity, making conventional clustering approaches insufficient for capturing their intrinsic structural relationships. To ...

Added: May 13, 2026

Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing

Velichkov B., Nikolova-Koleva I., Slavcheva M., Shumen: INCOMA Ltd, 2025.

The RANLP 2025 Student Research Workshop (RANLPStud’2025) is a special track of the established international conference Recent Advances in Natural Language Processing (RANLP’2025). The RANLPStud is being organised for the 9th time and this year is running in parallel with the other tracks of the main RANLP 2025 conference. The target of RANLPStud’25 is to be a ...

Added: May 12, 2026

Интегрированная среда моделирования для верификации и валидации программ управления подключенными и высокоавтоматизированными транспортными средствами

Stepanyants V., Долгов И. М., Хорошилов Г. С. et al., Труды Института системного программирования РАН 2026 Т. 38 № 3 С. 95–110

Highly automated and connected vehicles are gradually entering the market. Currently, solutions are being proposed that allow these technologies to be used for cooperative driving automation, which can significantly improve traffic safety. Such technologies and their software should be tested to ensure safety before being implemented in real systems. Verification and validation of vehicular control ...

Added: May 12, 2026

Connected and Automated Vehicle Scenario Manager Graphical User Interface

Tikhonov R., Efendiev M. T., Fedotenkov A. A., 2026 International Russian Smart Industry Conference (SmartIndustryCon) 2026 P. 542–547

High-fidelity simulation environments like CARLA and ROS are essential for connected and automated vehicle research. They allow researchers to verify and validate new software and technology without the time, financial, and safety overheads of real-world testing. However, their operation requires considerable expertise for creating platform-specific scenario configuration files, which complicates the research workflow. This paper ...

Added: May 11, 2026

Proceedings 2026 IEEE 11th International Conference on Smart Cloud SmartCloud 2026 8-10 May 2026

Los Alamitos: IEEE Computer Society, 2026.

It is a great pleasure for us to welcome you on behalf of the conference committees, to the 11th IEEE International Conference on Smart Cloud (IEEE SmartCloud 2026), we are glad that we can have this international conference in New York city, USA. Now, please allow us to introduce the IEEE SmartCloud 2026 conference. The ...

Added: May 10, 2026

От неизвестности к прозрачности: обзор технологий объяснимого ИИ (XAI)

Avdoshin S. M., Pesotskaya E. Y., Информационные технологии 2026 Т. 32 № 4 С. 185–194

With the rapid advancement of artificial intelligence, and deep learning in particular, models have emerged that are capable of delivering highly accurate predictions. However, the internal logic of such models remains difficult to interpret—an issue of critical importance, especially in domains where the correctness of an algorithm directly affects high-stakes decision-making. One promising avenue for ...

Added: May 8, 2026

Explainable AI for Industry 5.0: Shedding light on the black box

Avdoshin S. M., Pesotskaya E. Y., Business Informatics 2026 Vol. 20 No. 1 P. 7–28

The rapid development of artificial intelligence (AI) is accompanied by increasing computational complexity and decreasing model transparency, which significantly limits its adoption in critical domains that require a high level of trust, interpretability, and justification of decisions. Under these conditions, the field of Explainable Artificial Intelligence (XAI) has gained particular importance as it focuses on approaches and technologies that ...

Added: May 8, 2026

Comparative Analysis of Students’ Perceptions of Programming Puzzles: Parson’s and Wordle-Like

Varnavsky A., IEEE Access 2026 Vol. 14 P. 37487–37508

Puzzles are an excellent tool for learning computer science and programming, fostering increased interest, engagement, and motivation among students, as well as developing logical, critical, and computational thinking. Among beginner programmers, Parson's Programming Puzzles are quite popular, aimed at mastering the basic syntactic and logical constructs of programming languages. However, as students' skills grow, their ...

Added: May 7, 2026

Towards performance analysis of GPU-aware MPI over Angara interconnect

Ismagilov T., Mukosey A., Smirnov F. et al., International Journal of High Performance Computing Applications 2026 Vol. 40 No. 2 P. 240–253

One of the most important aspects of supercomputer development in the post-Moore era is the interconnect technologies that allow one to unite a multitude of processing elements into a well-synchronized computing system. Novel types of supercomputer interconnect require careful benchmarking and compliance with the requirements of modern hardware trends. GPU-based heterogeneous computing is one of ...

Added: May 7, 2026

Программные инструментальные средства для разработки мероприятий по снижению брака серийного производства

Yasnitsky L., Голдобин М. А., Мезенцев А. С., Прикладная математика и вопросы управления 2025 № 2 С. 99–116

Представлен обзор современных методов и основанных на них программных инструментах, применяемых для математического моделирования серийных производственных процессов с целью снижения брака и повышения качества производимых изделий. Перечисляются группы работ, нацеленных на обнаружение и классификацию дефектов, работ, в которых решаются задачи прогнозирования образования дефектов и определения значимости параметров, работ направленных на поиск оптимального сочетания технологических параметров изготовления изделий, ...

Added: May 5, 2026

Моделирование и оценка ресурсных затрат алгоритмов маршрутизации в сетях на кристалле с двумерной циркулянтной топологией

Монахова Э. А., Монахов О. Г., Rzaev E. et al., Прикладная дискретная математика 2026 Т. 71 С. 112–127

В настоящей работе исследовано совместное конструирование топологий семейств оптимальных по диаметру циркулянтных сетей $C(N; \pm 1, \pm s_2)$ и реализуемых для них оптимальных алгоритмов маршрутизации сложности $O(1)$. Предлагаемый алгоритм маршрутизации основан на использовании масштабируемых параметров $L$-образных шаблонов плотной укладки графов на плоскости для семейств оптимальных сетей. Определены аналитические формулы зависимости этих параметров от диаметра графов семейств ...

Added: May 4, 2026

AlphaDent: A dataset for automated tooth pathology detection

Sosnin E. I., Vasil’ev Y. L., R.A. Solovyev et al., Computer Optics 2025 Vol. 49 No. 6 P. 1129–1137

In this article, we present a new unique dataset for dental research – AlphaDent. This dataset is based on the DSLR camera photographs of the teeth of 295 patients and contains over 1200 images. The dataset is labeled for solving the instance segmentation problem and is divided into 9 classes. The article provides a detailed ...

Added: May 4, 2026

Мультимодальные модели в медицинской диагностике как универсальный инструмент

Назаренко А. Г., Федоров М. В., Moshkin A. et al., Вестник Росздравнадзора 2026 № 1 С. 14–29

Multimodal foundation models and medical multimodal large language models are establishing a new class of diagnostic clinical decision support systems capable of operating on heterogeneous data sources, including medical imaging (X-ray, CT, MRI, ultrasound, histopathology), physiological signals (ECG, EEG), clinical text (electronic health records, reports, discharge summaries), laboratory measurements, molecular profiling data, and related modalities. ...

Added: May 4, 2026

A textual fingerprint learning model to detect fake information spreaders in social networks

Behzadidoost R., Neurocomputing 2025 Vol. 665 P. 1–21

While earlier research has focused on detecting misinformation content, identifying the users who spread it, referred to in this paper as fake information spreaders, remains a relatively new challenge. These users deliberately mix true and false information, making detection more difficult. This paper proposes a textual fingerprint learning model to detect fake information spreaders. The ...

Added: March 12, 2026

Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence

Washington, United States of America: AAAI Press, 2025.

AAAI-25 Technical Tracks 23 (Natural Language Processing II) collects peer-reviewed research papers that advance the state of natural language processing, with an emphasis on large language models, efficient inference, instruction following, retrieval augmentation, and multimodal language understanding. The papers address both theoretical and practical challenges, including model efficiency, interactive generation, grounding in external knowledge and ...

Added: December 18, 2025

Lacuna Inc. at SemEval-2025 Task 4: LoRA-Enhanced Influence-Based Unlearning for LLMs

Kudelya A., Shirnin A., , in: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025).: Association for Computational Linguistics, 2025. P. 1528–1533.

This paper describes LIBU (LoRA enhanced influence-based unlearning), an algorithm to solve the task of unlearning - removing specific knowledge from a large language model without retraining from scratch and compromising its overall utility (SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models). The algorithm combines classical influence functions to remove the influence of ...

Added: November 17, 2025