Using Annotated Suffix Trees for Fuzzy Full Text Search

Dmitry Frolov

?

Using Annotated Suffix Trees for Fuzzy Full Text Search

Dmitry Frolov

In press

A method for fuzzy full text search is proposed. The method
follows a popular two-stage scheme with a novel second stage: a prelim-
inary search stage using an n-gram inverted index and, at the second
stage, relevance checking between the query and documents using fre-
quency annotated suffix trees (ASTs). The ASTs are built for all docu-
ments of the collection off-line. The method is compared with two pop-
ular fuzzy text retrieval techniques, one using n-gram inverted indexing
with Levenshtein distance checking and signature hashing, and the other
being Lemur, a popular toolkit for language modelling and information
retrieval. For computational experiments we use ”Reuters 21578” text
collection and a collection of USPTO patents. Our AST-based method
generally leads to accuracy scores that are similar to those obtained
by the winner, the Levenshtein distance-based method. However, our
method significantly outperforms the Levenshtein distance based method
over speed. Therefore, when using both criteria, the accuracy and speed,
simultaneously, the AST-based method has shown significant advantages.

Language: English

Keywords: information retrieval

In book

Communications in Computer and Information Science. Information Retrieval. 10th Russian Summer School, RuSSIR 2016, Saratov, Russia, August 22-26, 2016, Revised Selected Papers

Springer, 2016.

CIKM '25: Proceedings of the 34th ACM International Conference on Information and Knowledge Management

ACM, 2025.

It is our great honor and pleasure to welcome you to the 2025 ACM International Conference on Information and Knowledge Management (CIKM 2025). CIKM has long served as a premier annual forum for researchers and practitioners worldwide, rotating across different locations each year. We are delighted that, for the very first time, CIKM will take ...

Added: November 16, 2025

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Association for Computing Machinery (ACM), 2024.

Welcome to the 47th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024), taking place in Washington D.C., USA, from July 14 to 18, 2024. SIGIR serves as the foremost international forum for the presentation of groundbreaking research findings, the demonstration of innovative systems and techniques, and the exploration of forwardthinking ...

Added: May 9, 2024

HCI International 2023 Posters

Springer, 2023.

Added: October 21, 2023

Knowledge Discovery, Knowledge Engineering and Knowledge Management: 13th International Joint Conference, IC3K 2021, Virtual Event, October 25–27, 2021, Revised Selected Papers

Springer, 2023.

This book constitutes the extended and revised versions of a set of selected papers from the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2021, on October 25–27, 2021. The conference was held virtually due to the COVID-19 crisis. The 9 full papers included in this book were carefully reviewed and ...

Added: July 8, 2023

Advances in Information Retrieval. 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part II

Springer, 2023.

Added: March 22, 2023

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Association for Computing Machinery (ACM), 2022.

Added: July 8, 2022

Pattern Structures for Knowledge Processing and Information Retrieval

Kuznetsov S., Goncharova E., , in: Proceedings of the Fifth International Scientific Conference "Intelligent Information Technologies for Industry" (IITI'21)Vol. 330.: Springer, 2022. P. 410–420.

Added: October 28, 2021

Concept-based chatbot for interactive query refinement in product search

Goncharova E., Ilvovsky D., Galitsky B., , in: Proceedings of the 9th International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI 2021)Vol. 2972.: CEUR-WS, 2021. P. 51–58.

Added: October 28, 2021

Experimental IR Meets Multilinguality, Multimodality, and Interaction: 12th International Conference of the CLEF Association, CLEF 2021, Virtual Event, September 21–24, 2021, Proceedings

Springer, 2021.

Added: September 28, 2021

Data Analytics and Management in Data Intensive Domains. 23rd International Conference, DAMDID/RCDL 2021, Moscow, Russia, October 26–29, 2021, Revised Selected Papers

Springer, 2022.

“Data Analytics and Management in Data Intensive Domains” conference (DAMDID) is planned as a multidisciplinary forum of researchers and practitioners from various domains of science and research promoting cooperation and exchange of ideas in the area of data analysis and management in data intensive domains. Approaches to data analysis and management being developed in specific data intensive domains of X-informatics (such as X = astro, bio, chemo, geo, medicine, neuro, physics, ...

Added: August 30, 2021

A Systematic Evaluation of Transfer Learning and Pseudo-labeling with BERT-based Ranking Models

Mokrii I., Boytsov L., Braslavski P., , in: SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.: ACM, 2021. P. 2081–2085.

Due to high annotation costs making the best use of existing human-created training data is an important research direction. We, therefore, carry out a systematic evaluation of transferability of BERT-based neural ranking models across five English datasets. Previous studies focused primarily on zero-shot and few-shot transfer from a large dataset to a dataset with a ...

Added: August 11, 2021

SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

ACM, 2021.

Added: August 11, 2021

Advances in Information Retrieval. 43rd European Conference on IR Research

Springer, 2021.

Added: July 23, 2021

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’19)

NY: Association for Computing Machinery (ACM), 2019.

WelcometoSIGIR2019,the42ndAnnualInternationalACMSIGIRConferenceonResearchandDevelop-mentinInformationRetrieval,thepremierscientificconferenceinthebroadareaofinformationretrieval.WearedelightedtowelcomeyoutotheMuseumofScienceandIndustrylocatedinthenorth-eastofParis.TheconferenceissupportedbytheFrenchAssociationforInformationRetrievalandApplications,whichorganizestheyearlyFrenchIRconference.Itsmembersactivelyparticipatedintheorganizationofthisconference.Wereceivedgoodqualitysubmissionsinalltracksandevents:fullpapers,shortpapers,demos,industrypa-pers,tutorials,workshops,andthedoctoralconsortium.Wew ouldliketothankeveryonewhocontributedtothepaperselectionprocess,including100SeniorProgramCommittee(SPC)members,317ProgramCommittee(PC)members,and80additionalreviewersfortheircontributionstopaperselection. ...

Added: October 29, 2020

Foundations of Intelligent Systems. 25th International Symposium on Methodologies for Intelligent Systems: ISMIS 2020

Springer, 2020.

This book constitutes the proceedings of the 25th International Symposium on Foundations of Intelligent Systems, ISMIS 2020, held in Graz, Austria, in October 2020. The conference was held virtually due to the COVID-19 pandemic. The 35 full and 8 short papers presented in this volume were carefully reviewed and selected from 79 submissions. Included is also ...

Added: October 4, 2020

Experimental IR Meets Multilinguality, Multimodality, and Interaction

Springer, 2020.

Added: October 4, 2020

FCA-based Approach for Interactive Query Refinement with IR-chatbots

Makhalova T., Ilvovsky D., Galitsky B. et al., , in: RAAI 2020 Russian Advances in Artificial Intelligence 2020 Selected Contributions of the "Russian Advances in Artificial Intelligence" Track at RCAI 2020 co-located with 18th Russian Conference on Artificial Intelligence (RCAI 2020)Vol. 2648.: CEUR-WS, 2020. P. 144–156.

Information retrieval (IR) chatbot is a special class of virtual assistants, which is widely used nowadays in customer support services. However, the work of modern IR retrieval systems is limited by simple queries to the database, which does not utilize all the potential of interaction with the user. In this paper we implement an FCA-based ...

Added: September 15, 2020

Digital Transformation and Global Society, 4th International Conference, DTGS 2019

Springer, 2019.

This volume constitutes the refereed proceedings of the 4th International Conference on Digital Transformation and Global Society, DTGS 2019, held in St. Petersburg, Russia, in June 2019. The 56 revised full papers and 9 short papers presented in the volume were carefully reviewed and selected from 194 submissions. The papers are organized in topical sections on ...

Added: February 22, 2020

AIST: International Conference on Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Kazan, Russia, July 17–19, 2019, Revised Selected Papers

Springer, 2020.

This book constitutes the proceedings of the 8th International Conference on Analysis of Images, Social Networks and Texts, AIST 2019, held in Kazan, Russia, in July 2019. The 24 full papers and 10 short papers were carefully reviewed and selected from 134 submissions (of which 21 papers were rejected without being reviewed). The papers are organized ...

Added: February 9, 2020

Proceedings of the 27th ACM International Conference on Information and Knowledge Management

Association for Computing Machinery (ACM), 2018.

Added: December 27, 2019