Annotated suffix tree as a way of text representation for information retrieval in text collections

Dmitry S. Frolov

doi:10.17323/1998-0663.2015.4.63.70

Publications

?

Annotated suffix tree as a way of text representation for information retrieval in text collections

Business Informatics. 2015. No. 4. P. 63–70.

Dmitry S. Frolov

A method for information retrieval based on annotated suffix trees (AST) is presented. The method is based on a string-to-document relevance score calculated using AST as well as fragment reverse indexing for improving performance. We developed a search engine based on the method. This engine is compared with some other popular text aggregating techniques: probabilistic latent semantic indexing (PLSA) and latent Dirichlet allocation (LDA). We used real data for computation experiments: an online store’s xml-catalogs and collections of web pages (both in Russian) and a real user’s queries from the Yandex. Wordstat service. As quality metrics, we used point quality estimations and graphical representations. Our AST-based method generally leads to results that are similar to those obtained by the other methods. However, in the case of inaccurate queries, AST-based results are superior. The speed of the AST-based method is slightly worse than the speed of the PLSA/LDA-based methods. Due to the observed correlation between the average query performing time and the string lengths at the AST construction phase, one can improve the performance of the algorithm by dividing the texts into smaller fragments at the preprocessing stage. However, the quality of search may suffer if the fragments are too short. Therefore, the applicability of annotated suffix tree techniques for text retrieval problems is demonstrated. Moreover, the AST-based method has significant advantages in the case of fuzzy search.

Research target: Computer Science

Language: English

DOI

Text on another site

Keywords: information retrieval

Prediction of protein-protein interactions using point transformer and spherical Convex Hull graphs

Arteaga Moreano B. D., Poptsova M., Computational and Structural Biotechnology Journal 2025 P. 82–93

Accurate predictions and large-scale identification of protein-protein interactions (PPIs) are crucial for understanding their inherent biological mechanisms and protein functions in virtually all biological processes. Nowadays, graph-based deep learning models have made significant contributions in modeling proteins with physicochemical and geometric features. However, most of these models rely on conventional graph construction methods, such as ...

Added: December 22, 2025

Novel Activation Sparsification Approach for Large Language Models

Demidovskij A., Бурмистрова Е. О., Жариков Е. И., Optical Memory and Neural Networks (Information Optics) 2025 Vol. 34 P. 166–174

Large Language Models (LLMs) require a lot of computational resources for inference. That is why the latest advancements in hardware design may offer many possibilities for speeding the LLM up. For example, TPU optimize calculations on data, transformed into the Coordinate sparse tensor format. The SparseCore processing unit that performs the calculations is heavily tailored ...

Added: December 22, 2025

Performance Study of Modern Zeroth-Order Optimization Methods for LLM Fine-Tuning

Demidovskij A., Трутнев А. И., Optical Memory and Neural Networks (Information Optics) 2025 Vol. 34 P. 16–29

Large Language Models (LLMs) are widely employed across a broad range of applications due to their versatility and state-of-the-art performance. However, as usage scenarios grow, there is a pressing demand for task-specific adaptation of LLMs through fine-tuning. While full fine-tuning (FT) remains the most preferred in terms of quality, its high memory and computation requirements ...

Added: December 22, 2025

2025 IEEE XVII International Scientific and Technical Conference on Actual Problems of Electronic Instrument Engineering (APEIE)

IEEE, 2025.

2025 IEEE XVII International Scientific and Technical Conference on Actual Problems of Electronic Instrument Engineering (APEIE) ...

Added: December 19, 2025

Flexible Stock Market Algorithm

Rubchinskiy A., Chubarova D., Technology and Investment 2025 Vol. 16 No. 4 P. 211–240

The article considers one of the most famous examples of socio-economic systems characterized by significant uncertainty—the S&P-500 stock market, where shares of 500 largest US companies are traded. The flexible algorithm for daily trading has been developed. It is based on known fixed data about cost of shares in previous days as well as on ...

Added: December 19, 2025

Современные проблемы науки

Данилевич Т. В., Yasnitsky L., М.: Юрайт, 2025.

This course examines both the historical aspects of science and current philosophical issues related to its contemporary development and impact on society. It describes the emergence of science and its progressive advancement, its adoption by society, and the introduction and dominance of scientific achievements in all areas of human activity. Particular attention is paid to ...

Added: December 19, 2025

Modeling Pruning as a Phase Transition: A Thermodynamic Analysis of Neural Activations

- Р. М., Koltcov Sergei, Surkov A. et al., Computers, Materials and Continua 2025 P. 1–24

Activation pruning reduces neural network complexity by eliminating low-importance neuron activations, yet identifying the critical pruning threshold—beyond which accuracy rapidly deteriorates—remains computationally expensive and typically requires exhaustive search. We introduce a thermodynamics-inspired framework that treats activation distributions as energy-filtered physical systems and employs the free energy of activations as a principled evaluation metric. Phase-transition–like phenomena ...

Added: December 19, 2025

Распределённые компьютерные и телекоммуникационные сети: управление, вычисление, связь (DCCN-2023)

-, 2023.

В научном электронном издании представлены материалы XXVI Международной научной конференции «Распределенные компьютерные и телекоммуникационные сети: управление, вычисление, связь» по следующим направлениям: - Алгоритмы и протоколы телекоммуникационных сетей - Управление в компьютерных и инфокоммуникационных системах - Анализ производительности, оценка QoS / QoE и эффективность сетей - Аналитическое и имитационное моделирование коммуникационных систем последующих поколений - Эволюция беспроводных сетей в направлении 5G; - Технологии сантиметрового и миллиметрового ...

Added: December 18, 2025

Безопасность России. Правовые, социально-экономические и научно-технические аспекты. Стратегические приоритеты национальной безопасности. Защита традиционных Российских духовно-нравственных ценностей, культуры и исторической памяти как восьмой стратегический приоритет национальной безопасности

Istratov A., Махутов Н. А., Звягин А. А. et al., Тула: Аквариус, 2025.

Рассматриваются вопросы формирования и реализации концепций, стратегий и методической базы национальной безопасности, а также научно-методологического обеспечения защиты традиционных российских духовно-нравственных ценностей и культурно-исторической памяти в аспекте достижения целей и решения задач восьмого стратегического приоритета национальной безопасности ...

Added: December 18, 2025

Nonlinear Structure of Super‐Thin Current Sheets With Guide Field: Equilibrium or Dynamic?

Tsareva O., Leonenko M., Grigorenko E. et al., JOURNAL OF GEOPHYSICAL RESEARCH-SPACE PHYSICS 2025 Vol. 130 P. 1–15

1D self‐consistent model of super‐thin current sheet (STCS) based both on a quasi‐adiabatic approach for the demagnetized proton and electron motion is generalized to the case of configuration with nonzero guide field. The part of electron population is supposed to be magnetized (described via guiding center approximation). The magnetic field configuration includes three components: self‐consistent Bx(z) and By(z) components ...

Added: December 18, 2025

Эффективность рынка искусственного интеллекта: ожидания и реальность

Kuzminov Y., Kruchinskaia E., Форсайт 2025 Т. 19 № 4 С. 6–16

The development of Artificial Intelligence (AI) is significantly impacting the global economy, transforming corporate strategies and enhancing operational efficiency. This study aims to analyze the relative efficiency of the Generative AI (GenAI) market, considering the market size of chips, servers, and data center infrastructure required for its operation, and comparing these market sizes with the ...

Added: December 17, 2025

Crossroads of Computability and Logic: Insights, Inspirations, and Innovations, 21st Conference on Computability in Europe, CiE 2025, Lisbon, Portugal, July 14–18, 2025, Proceedings

Cham: Springer, 2025.

This book constitutes the refereed proceedings of the 21st Conference on Computability and Logic, CiE 2025, held in Lisbon, Portugal, during July 14–18, 2025. The 27 full papers included in this book were carefully reviewed and selected from 49 submissions. They focus on computability-related science, ranging over mathematics, computer science and applications in various natural and engineering ...

Added: December 17, 2025

Глубокая нейронная сеть с графовым вниманием для выявления поддельных изображений лица

Pikul A. S., Лепендин А. А., Труды молодых ученых Алтайского государственного университета 2024 № 20 С. 190–193

Представлен новый подход для выявления атак презентации на системы распознавания по лицу. Он основан на использовании механизма графового внимания, применяемого к промежуточным картам характеристик изображений лица, вычисленным сверточной сетью ResNet18. Показано, что предложенный подход позволил добиться высокого качества распознавания поддельных изображений при лицевой биометрической верификации, сравнимого с имеющимися в настоящее время альтернативными решениями. ...

Added: December 12, 2025

Ансамбль современных моделей компьютерного зрения для задачи обнаружения дипфейков

Pikul A. S., Безопасность информационных технологий 2024 Т. 31 № 4 С. 116–127

This article explores the potential use of modern computer vision architectures for the task of deepfake detection. The following architectures are considered: EfficientNet, Vision Transformer (ViT), VisionLSTM (ViL), Vision KAN, and Mamba Vision. The novelty of the approach lies in the application and comparison of these architectures, as well as their combination into paired ensembles ...

Added: December 12, 2025

Enhancing explainability in deepfake detection with graph attention networks

Pikul A. S., Popov I. Y., Безопасность информационных технологий 2025 Vol. 32 P. 73–82

Understanding how artificial intelligence models make decisions is important, especially for difficult tasks like detecting deepfakes, where it's not enough to just get a result – it needs to know why the model made that choice. Many current methods, like Shapley additive explanations (SHAP) and Gradient-weighted Class Activation Mapping (Grad-CAM), help explain these decisions, but ...

Added: December 12, 2025

Российская модель использования ИИ в цифровых экосистемах медиакоммуникационной индустрии

Vartanov S., Tyshetskaya A., Вестник Московского университета. Серия 10: Журналистика 2025 № 5 С. 23–53

Media has been at the forefront of digital transformation in recent years: not only have the methods of creating, selling, storing, and consuming media content and media services changed, but also the structure of the media communication industry (MCI) itself. Considering its new structure and subjectivity, one cannot help but pay attention to artificial intelligence (AI) technologies that manifest ...

Added: December 11, 2025

ComputAgeBench: Epigenetic Aging Clocks Benchmark

Kriukov D., Efimov E., Kuzmina E. et al., ACM Transactions on Knowledge Discovery from Data 2025 Vol. - No. - P. 5560–5570

The success of clinical trials of longevity drugs relies heavily on identifying integrative health and aging biomarkers, such as biological age. Epigenetic aging clocks predict the biological age of individuals using their DNA methylation profiles, commonly retrieved from blood samples. However, there is no standardized methodology to validate and compare epigenetic clock models. We propose ComputAgeBench, ...

Added: December 10, 2025

О базовых математических определениях цифровых технологий и искусственного интеллекта

Semenov A., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 № S С. 7–12

The paper proposes a system of definitions for the basic concepts of computability theory that underlie the mathematics of the digital world: algorithm, computability, calculus, object complexity, close to modern undertnding. Hierarchies of the finite and the problem of consistency are considered. ...

Added: December 6, 2025

2025 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO)

IEEE, 2025.

The international scientific and engineering conference “Systems of Signal Synchronization, Generating and Processing in Telecommunications” has been held since 1974. For 50 years of work the conference has become a widely known forum for specialists of the field. The papers which are discussed at the conference can be divided into the following chapters: 1) Synchronization Systems and Devices 2) Signal ...

Added: December 6, 2025

Comparative Analysis of Requirements Prioritization Methods for Personalized Nutrition Web Applications

Mozhegova A. S., V.V. Lanin, Proceedings of the Institute for System Programming of the RAS 2025 Vol. 37 No. 5 P. 225–240

This study investigates the application of five requirements prioritization methods – MoSCoW, Kano Model, Weighted Scoring, RICE, and Cost of Delay (CoD) – in the development of a web application for personalized nutrition. The research addresses the challenge of managing limited resources (time, financial, and human) while maximizing user value and ensuring safety in a ...

Added: December 4, 2025

CIKM '25: Proceedings of the 34th ACM International Conference on Information and Knowledge Management

ACM, 2025.

It is our great honor and pleasure to welcome you to the 2025 ACM International Conference on Information and Knowledge Management (CIKM 2025). CIKM has long served as a premier annual forum for researchers and practitioners worldwide, rotating across different locations each year. We are delighted that, for the very first time, CIKM will take ...

Added: November 16, 2025

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Association for Computing Machinery (ACM), 2024.

Welcome to the 47th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024), taking place in Washington D.C., USA, from July 14 to 18, 2024. SIGIR serves as the foremost international forum for the presentation of groundbreaking research findings, the demonstration of innovative systems and techniques, and the exploration of forwardthinking ...

Added: May 9, 2024

HCI International 2023 Posters

Springer, 2023.

Added: October 21, 2023

Knowledge Discovery, Knowledge Engineering and Knowledge Management: 13th International Joint Conference, IC3K 2021, Virtual Event, October 25–27, 2021, Revised Selected Papers

Springer, 2023.

This book constitutes the extended and revised versions of a set of selected papers from the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2021, on October 25–27, 2021. The conference was held virtually due to the COVID-19 crisis. The 9 full papers included in this book were carefully reviewed and ...

Added: July 8, 2023