?
Cross-Document Pattern Matching
P. 196-207.
Kucherov G., Nekrich Y., Starikovskaya T.
We study a new variant of the string matching problem called {\em cross-document string matching}, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, and efficient linear-space solutions are proposed with query time bounds that either do not depend at all on the pattern size or depend on it in a very limited way (doubly logarithmic). As a side result, we propose an improved solution to the {\em weighted level ancestor} problem.
Language:
English
In book
Vol. 7354: Proceedings of the 23rd Symposium on Combinatorial Pattern Matching. , Berlin : Springer, 2012
Babenko M., Kolesnichenko I., Starikovskaya T., , in : Lecture Notes in Computer Science. Vol. 7922: Proceedings of the 24th Symposium on Combinatorial Pattern Matching.: Berlin : Springer, 2013. P. 28-37.
Lexicographically minimal and lexicographically maximal suffixes of a string are fundamental notions of stringology. It is well known that the lexicographically minimal and maximal suffixes of a given string S can be computed in linear time and space by constructing a suffix tree or a suffix array of S. Here we consider the case when ...
Added: October 30, 2013
Kucherov G., Nekrich Y., Starikovskaya T., , in : Lecture Notes in Computer Science. Vol. 7608: Proceedings of the 19th International Symposium on String Processing and Information Retrieval.: Berlin : Springer, 2012. P. 307-317.
We study the following three problems of computing generic or discriminating words for a given collection of documents. Given a pattern $P$ and a threshold $d$, we want to report (i) all longest extensions of $P$ which occur in at least $d$ documents, (ii) all shortest extensions of $P$ which occur in less than $d$ ...
Added: October 30, 2013
Vildhoj H. W., Starikovskaya T., , in : Lecture Notes in Computer Science. Vol. 7922: Proceedings of the 24th Symposium on Combinatorial Pattern Matching.: Berlin : Springer, 2013. P. 223-234.
Lexicographically minimal and lexicographically maximal suffixes of a string are fundamental notions of stringology. It is well known that the lexicographically minimal and maximal suffixes of a given string $S$ can be computed in linear time and space by constructing a suffix tree or a suffix array of $S$. Here we consider the case when ...
Added: October 30, 2013
Babenko M. A., Starikovskaya T., , in : Lecture Notes in Computer Science. Vol. 5010: Proceedings of the Third International Computer Science Symposium in Russia.: Berlin : Springer, 2008. P. 64-75.
Given a set of $N$ strings $A = \set{\alpha_1, \ldots, \alpha_N}$ of total length $n$ over alphabet~$\Sigma$ one may ask to find, for a fixed integer $K$, $2 \le K \le N$, the longest substring $\beta$ that appears in at least $K$ strings in $A$. It is known that this problem can be solved in ...
Added: October 30, 2013
Babenko A., Lempitsky V., , in : Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012). : Providence : IEEE, 2012. P. 3069-3076.
A new data structure for efficient similarity search in very large dataseis of high-dimensional vectors is introduced. This structure called the inverted multi-index generalizes the inverted index idea by replacing the standard quantization within inverted indices with product quantization. For very similar retrieval complexity and preprocessing time, inverted multi-indices achieve a much denser subdivision of ...
Added: October 1, 2014
Полякова О. А., Пермь : Издательство Пермского национального исследовательского политехнического университета, 2019
The article deals with the application of the basic principles of structured programming in complex programs systems in the high-level language C ++, which are demonstrated on meaningful examples. ...
Added: August 31, 2020
Babenko M. A., Starikovskaya T., Проблемы передачи информации 2011 Т. 47 № 1 С. 28-33
Описан алгоритм, решающий задачу нахождения приближенной максимальной общей подстроки двух строк $\alpha_1$ и $\alpha_2$ за время $O(\abs{\alpha_1} \abs{\alpha_2})$ с использованием $O(\abs{\alpha_1})$ дополнительной памяти. При обращении к строке $\alpha_2$ алгоритм читает ее только \emph{слева направо, начиная с первого символа}. Используется RAM-модель вычислений. ...
Added: October 30, 2013
Berlin : Springer, 2012
This book constitutes the refereed proceedings of the 23rd Annual Symposium on Combinatorial Pattern Matching, CPM 2012, held in Helsinki, Finalnd, in July 2012.
The 33 revised full papers presented together with 2 invited talks were carefully reviewed and selected from 60 submissions. The papers address issues of searching and matching strings and more complicated patterns ...
Added: October 30, 2013
Галимуллин М. Ф., Kalishenko E., Рапоткин Н. А., Известия Санкт-Петербургского государственного электротехнического университета ЛЭТИ 2016 № 7 С. 13-23
Deals with the development of threads synchronizing strategies based on the creation of concurrent «flat-combining» data structures as well as research of their performance. The paper considers «flat-combining» approach and its implementation in the library libcds, the development of thread synchronization strategy and its possible implementations. The efficiency of synchronization strategies usage is researched on ...
Added: November 1, 2018
Babkina T. S., Demidovskij A., Babkin E., International Journal of Big Data Intelligence 2018 Vol. 5 No. 3 P. 143-155
This paper presents two new approaches to solving a classical NP-hard problem of maximum clique problem (MCP), which frequently arises in the domain of information management, including design of database structures and big data processing. In our research, we are focusing on solving that problem using the paradigm of artificial neural networks. The first approach ...
Added: October 3, 2018
Kolpakov R. M., Kucherov G., Starikovskaya T., , in : Proceedings of the First International Conference on Data Compression, Communications and Processing. : NY : IEEE Computer Society, 2013. P. 92-97.
We consider a compact text index based on evenly spaced sparse suffix trees of a text \cite{KU-96}. Such a tree is defined by partitioning the text into blocks of equal size and constructing the suffix tree only for those suffixes that start at block boundaries. We propose a new pattern matching algorithm on this structure. ...
Added: October 30, 2013
Таганрог : Издательство ЮФУ, 2015
Сборник составлен по материалам VI Международной научно-практической конференции "Технологии разработки информационных систем", состоявшейся 6-12 сентабря 2015 г. в г. Геленджик.
Ответственность за аутентичность и точность цитат, имен, названий и иных сведений несут авторы публикуемых материалов. Материалы публикуются в авторской редакции.
Мероприятие проведено при финансовой поддержке Российского фонда фундаментальных исследований (грант № 15-07-20559-г). ...
Added: September 13, 2015
Babenko M. A., Gawrychowski P., Kociumaka T. et al., , in : Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. : San Diego : SIAM, 2015. P. 572-591.
We present an improved wavelet tree construction algorithm and discuss its applications to a number of rank/select problems for integer keys and strings. Given a string of length n over an alphabet of size ω ≤ n, our method builds the wavelet tree in O(n log ω √log n) time, improving upon the state-of-the-art algorithm ...
Added: October 4, 2014
Springer, 2019
Added: August 4, 2019
Babenko M. A., Kolesnichenko I., Smirnov I., Theory of Computing Systems 2019 Vol. 63 No. 4 P. 637-646
Heaps are well-studied fundamental data structures, having myriads of applications, both theoretical and practical. We consider the problem of designing a heap with an “optimal” extract-min operation. Assuming an arbitrary linear ordering of keys, a heap with n elements typically takes O(log n) time to extract the minimum. Extracting all elements faster is impossible as ...
Added: December 6, 2019
Rubtsov A. A., Vyalyi M., , in : Descriptional Complexity of Formal Systems: 23rd IFIP WG 1.02 International Conference, DCFS 2021, Virtual Event, September 5, 2021, Proceedings. : Springer, 2021. P. 150-162.
Added: February 2, 2022
Fomichev M., Ulyanov M., Информационные технологии 2018 Т. 24 № 11 С. 698-704
Повышение временной эффективности программных реализаций метода ветвей и границ для асимметричной задачи коммивояжера может быть достигнуто как за счет выбора наиболее приемлемой структуры данных, обеспечивающей эффективные по времени операции с листьями поискового дерева решений, так и за счет использования дополнительной памяти для хранения усеченных матриц в листьях поискового дерева решений. Дополнительно могут быть предложены и ...
Added: January 26, 2020
Berlin, Heidelberg : Springer, 2017
The 12th issue of LNCS Transactions on Petri Nets and Other Models of Concurrency (ToPNoC) contains revised and extended versions of a selection of the best papers from the workshops held at the 37th International Conference on Application and Theory of Petri Nets and Concurrency (Petri Nets 2016, Toruń, Poland, 19–24 June 2016), and the ...
Added: September 27, 2017
Kucherov G., Nekrich Y., Gawrychowski P. et al., , in : Lecture Notes in Computer Science. Vol. 8214: Proceedings of the 20th Symposium on String Processing and Information Retrieval.: Berlin : Springer, 2013. P. 129-140.
We revisit two variants of the problem of computing minimal discriminating words studied in [5]. Given a pattern P and a threshold d, we want to report (i) all shortest extensions of P which occur in less than d documents, and (ii) all shortest extensions of P which occur only in d selected documents. For ...
Added: October 30, 2013
Maxim Babenko, Gawrychowski P., Kociumaka T. et al., Theoretical Computer Science 2016 Vol. 638 P. 112-121
We consider the problems of computing the maximal and the minimal non-empty suffixes of substrings of a longer text of length . n. For the minimal suffix problem we show that for every . τ, . 1≤τ≤logn, there exists a linear-space data structure with . O(τ) query time and . O(nlogn/τ) preprocessing time. As a ...
Added: October 8, 2015
Ponomarenko A., В кн. : Труды 38-й конференции "Информационные технологии и системы - 2014". : Н. Новгород : ИППИ РАН, 2014. С. 194-200.
Классическим подходом к организации информации для последующего быстрого поиска является построение индекса. Однако этот подход имеет несколько недостатков. Индекс необходимо перестраивать и поддерживать в актуальном виде, что затруднительно в случае разрозненной информации, такой как текстовая информация в WEB. Эти недостатки являются следствием того, что индекс является реорганизованной копией индексируемой информации. В данной работе предлагается способ ...
Added: September 10, 2014
Kopelowitz T., Kucherov G., Nekrich Y. et al., Journal of Discrete Algorithms 2013
We study a new variant of the pattern matching problem called cross-document pattern matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, and efficient linear ...
Added: October 30, 2013
Springer, 2019
16th International Symposium, WADS 2019, Edmonton, AB, Canada, August 5–7, 2019, Proceedings ...
Added: October 26, 2021
Ulyanov M.V., Fomichev M.I., Business Informatics 2015 No. 4 (34) P. 38-46
The resource efficiency of different implementations of the branch-and-bound method for the classical traveling salesman problem depends, inter alia, on ways to organize a search decision tree generated by this method. The classic «time-memory» dilemma is realized herein either by an option of storing reduced matrices at the points of the decision tree, which leads ...
Added: November 5, 2016