Computing Discriminating and Generic Words
We study the following three problems of computing generic or discriminating words for a given collection of documents. Given a pattern $P$ and a threshold $d$, we want to report (i) all longest extensions of $P$ which occur in at least $d$ documents, (ii) all shortest extensions of $P$ which occur in less than $d$ documents, and (iii) all shortest extensions of $P$ which occur only in $d$ selected documents. For these problems, we propose efficient algorithms based on suffix trees and using advanced data structure techniques. For problem (i), we propose an optimal solution with constant running time per output word.
This volume contains the papers presented at the 6th International Conference on Similarity Search and Applications (SISAP 2013), held at A Coruna, Spain, during October 2–4, 2013. The International Conference on Similarity Search and Applications (SISAP) is an annual forum for researchers and application developers in the area of similarity data management. It aims at the technological problems shared by many application domains, such as data mining, information retrieval, computer vision, pattern recognition, computational biology, geography, biometrics, machine learning, and many others that need similarity searching as a necessary supporting service. Traditionally, SISAP conferences have put emphasis on the distance-based searching, but in general the conference concerns both the effectiveness and efficiency aspects of any similarity search approach.
In this paper, we present a modification of dynamic programming algorithms (DPA), which we denote as graphical algorithms (GrA). For some single machine scheduling problems, it is shown that the time complexity of the GrA is less than the time complexity of the standard DPA. Moreover, the average running time of the GrA is often essentially smaller. A GrA can also solve large-scale instances and instances, where the parameters are not integer. For some problems, GrA has a polynomial time complexity in contrast to a pseudo-polynomial complexity of a DPA.
We study a new variant of the pattern matching problem called cross-document pattern matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, and efficient linear space solutions are proposed with query time bounds that either do not depend at all on the pattern size or depend on it in a very limited way (doubly logarithmic). As a side result, we propose an improved solution to the weighted ancestor problem.
Information systems have been developed in parallel with computer science, although information systems have roots in different disciplines including mathematics, engineering, and cybernetics. Research in information systems is by nature very interdisciplinary. As it is evidenced by the chapters in this book, dynamics of information systems has several diverse applications. The book presents the state-of-the-art work on theory and practice relevant to the dynamics of information systems. First, the book covers algorithmic approaches to numerical computations with infinite and infinitesimal numbers. Also the book presents important problems arising in service-oriented systems, such as dynamic composition, analysis of modern service-oriented information systems, and estimation of customer service times on a rail network from GPS data. After that, the book addresses the complexity of the problems arising in stochastic and distributed systems. In addition, the book discusses modulating communication for improving multi-agent learning convergence. Network issues, in particular minimum risk maximum clique problems, vulnerability of sensor networks, influence diffusion, community detection, and link prediction in social network analysis, as well as a comparative analysis of algorithms for transmission network expansion planning are described in subsequent chapters. We thank all the authors and anonymous referees for their advice and expertise in providing valuable contributions, which improved the quality of this book. Furthermore, we want to thank Springer for helping us to produce this book.
In this paper, we consider algorithms involved in the computation of the Duquenne–Guigues basis of implications. The most widely used algorithm for constructing the basis is Ganter’s Next Closure, designed for generating closed sets of an arbitrary closure system. We show that, for the purpose of generating the basis, the algorithm can be optimized. We compare the performance of the original algorithm and its optimized version in a series of experiments using artificially generated and real-life datasets. An important computationally expensive subroutine of the algorithm generates the closure of an attribute set with respect to a set of implications. We compare the performance of three algorithms for this task on their own, as well as in conjunction with each of the two algorithms for generating the basis. We also discuss other approaches to constructing the Duquenne–Guigues basis.
В этой главе продолжнается начатая в учебнике "Информатика 7 класс" линия темы "Алгоритмизация и программирование". Рассматриваются основы математической логики, алгоритмы поиска и сортировки массивов. Изучаются проблемы структурирования программ.
The consideration of symbolic machine learning algorithms as an entire class will make it possible, in the future, to generate algorithms, with the aid of some parameters, depending on the initial users’ requirements and the quality of solving targeted problems in domain applications.
Diagnostic Test Approaches to Machine Learning and Commonsense Reasoning Systems surveys, analyzes, and compares the most effective algorithms for mining all kinds of logical rules. Global academics and professionals in related fields have come together to create this unique knowledge-sharing resources which will serve as a forum for future collaborations.