### ?

## Pattern Matching on Sparse Suffix Trees

P. 92-97.

Kolpakov R. M., Kucherov G., Starikovskaya T.

We consider a compact text index based on evenly spaced sparse suffix trees of a text \cite{KU-96}. Such a tree is defined by partitioning the text into blocks of equal size and constructing the suffix tree only for those suffixes that start at block boundaries. We propose a new pattern matching algorithm on this structure. The algorithm is based on a notion of suffix links different from that of~\cite{KU-96} and on the packing of several letters into one computer word.

### In book

Proceedings of the First International Conference on Data Compression, Communications and Processing

NY : IEEE Computer Society, 2013

Kopelowitz T., Kucherov G., Nekrich Y. et al., Journal of Discrete Algorithms 2013

We study a new variant of the pattern matching problem called cross-document pattern matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, and efficient linear ...

Added: October 30, 2013

Kucherov G., Nekrich Y., Starikovskaya T., , in : Lecture Notes in Computer Science. Vol. 7354: Proceedings of the 23rd Symposium on Combinatorial Pattern Matching.: Berlin : Springer, 2012. P. 196-207.

We study a new variant of the string matching problem called {\em cross-document string matching}, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, ...

Added: October 30, 2013

Babenko M. A., Starikovskaya T., Проблемы передачи информации 2011 Т. 47 № 1 С. 28-33

Описан алгоритм, решающий задачу нахождения приближенной максимальной общей подстроки двух строк $\alpha_1$ и $\alpha_2$ за время $O(\abs{\alpha_1} \abs{\alpha_2})$ с использованием $O(\abs{\alpha_1})$ дополнительной памяти. При обращении к строке $\alpha_2$ алгоритм читает ее только \emph{слева направо, начиная с первого символа}. Используется RAM-модель вычислений. ...

Added: October 30, 2013

Babenko M. A., Starikovskaya T., , in : Lecture Notes in Computer Science. Vol. 5010: Proceedings of the Third International Computer Science Symposium in Russia.: Berlin : Springer, 2008. P. 64-75.

Given a set of $N$ strings $A = \set{\alpha_1, \ldots, \alpha_N}$ of total length $n$ over alphabet~$\Sigma$ one may ask to find, for a fixed integer $K$, $2 \le K \le N$, the longest substring $\beta$ that appears in at least $K$ strings in $A$. It is known that this problem can be solved in ...

Added: October 30, 2013

Kucherov G., Nekrich Y., Gawrychowski P. et al., , in : Lecture Notes in Computer Science. Vol. 8214: Proceedings of the 20th Symposium on String Processing and Information Retrieval.: Berlin : Springer, 2013. P. 129-140.

We revisit two variants of the problem of computing minimal discriminating words studied in [5]. Given a pattern P and a threshold d, we want to report (i) all shortest extensions of P which occur in less than d documents, and (ii) all shortest extensions of P which occur only in d selected documents. For ...

Added: October 30, 2013

Babenko M., Kolesnichenko I., Starikovskaya T., , in : Lecture Notes in Computer Science. Vol. 7922: Proceedings of the 24th Symposium on Combinatorial Pattern Matching.: Berlin : Springer, 2013. P. 28-37.

Lexicographically minimal and lexicographically maximal suffixes of a string are fundamental notions of stringology. It is well known that the lexicographically minimal and maximal suffixes of a given string S can be computed in linear time and space by constructing a suffix tree or a suffix array of S. Here we consider the case when ...

Added: October 30, 2013

Vildhoj H. W., Starikovskaya T., , in : Lecture Notes in Computer Science. Vol. 7922: Proceedings of the 24th Symposium on Combinatorial Pattern Matching.: Berlin : Springer, 2013. P. 223-234.

Lexicographically minimal and lexicographically maximal suffixes of a string are fundamental notions of stringology. It is well known that the lexicographically minimal and maximal suffixes of a given string $S$ can be computed in linear time and space by constructing a suffix tree or a suffix array of $S$. Here we consider the case when ...

Added: October 30, 2013

Berlin : Springer, 2012

This book constitutes the refereed proceedings of the 23rd Annual Symposium on Combinatorial Pattern Matching, CPM 2012, held in Helsinki, Finalnd, in July 2012.
The 33 revised full papers presented together with 2 invited talks were carefully reviewed and selected from 60 submissions. The papers address issues of searching and matching strings and more complicated patterns ...

Added: October 30, 2013

Neznanov A., Kourie D. G., , in : RuZA 2015 Workshop. Proceedings of Russian and South African Workshop on Knowledge Discovery Techniques Based on Formal Concept Analysis (RuZA 2015). November 30 - December 5, 2015, Stellenbosch, South Africa. Vol. 1552.: Aachen : CEUR Workshop Proceedings, 2015.

Formal Concept Analysis Research Toolbox (FCART) is an integrated environment for knowledge and data engineers with a set of research tools based on Formal Concept Analysis (FCA). In the paper we consider main FCA workflow and some applications in the field of the text pattern matching. ...

Added: June 14, 2016

Kucherov G., Nekrich Y., Starikovskaya T., , in : Lecture Notes in Computer Science. Vol. 7608: Proceedings of the 19th International Symposium on String Processing and Information Retrieval.: Berlin : Springer, 2012. P. 307-317.

We study the following three problems of computing generic or discriminating words for a given collection of documents. Given a pattern $P$ and a threshold $d$, we want to report (i) all longest extensions of $P$ which occur in at least $d$ documents, (ii) all shortest extensions of $P$ which occur in less than $d$ ...

Added: October 30, 2013

Starikovskaya T., , in : Lecture Notes in Computer Science. Vol. 7464: Proceedings of the 37th International Symposium on Mathematical Foundations of Computer Science.: Berlin : Springer, 2012. P. 789-799.

We present an algorithm which computes the Lempel-Ziv factorization of a word $W$ of length $n$ on an alphabet $\Sigma$ of size $\sigma$ online in the following sense: it reads $W$ starting from the left, and, after reading each $r = O(\log_{\sigma}{n})$ characters of $W$, updates the Lempel-Ziv factorization. The algorithm requires $O(n\log\sigma)$ bits of ...

Added: October 30, 2013