Learning Interpretable Prefix-Based Patterns from Demographic Sequences
There are many different methods for computing relevant
patterns in sequential data and interpreting the results. In this paper,
we compute emerging patterns (EP) in demographic sequences using
sequence-based pattern structures, along with different algorithmic solutions.
The purpose of this method is to meet the following domain
requirement: the obtained patterns must be (closed) frequent contiguous
prefixes of the input sequences. This is required in order for demographers
to fully understand and interpret the results.
A new general and efficient architecture for working with pattern structures, an extension of FCA for dealing with “complex” descriptions, is introduced and implemented in a subsystem of Formal Concept Analysis Research Toolbox (FCART). The architecture is universal in terms of possible dataset structures and formats, techniques of pattern structure manipulation.
Proceedings of Machine Learning Research: Volume 97: International Conference on Machine Learning, 9-15 June 2019, Long Beach, California, USA
With an increased interest in machine processable data and with the progress of semantic technologies, many datasets are now published in the form of RDF triples for constituting the so-called Web of Data. Data can be queried using SPARQL but there are still needs for integrating, classifying and exploring the data for data analysis and knowledge discovery purposes. This research work proposes a new approach based on Formal Concept Analysis and Pattern Structures for building a pattern concept lattice from a set of RDF triples. This lattice can be used for data exploration and in particular visualized thanks to an adapted tool. The specific pattern structure introduced for RDF data allows to make a bridge with other studies on the use of structured attribute sets when building concept lattices. Our approach is experimentally validated on the classification of RDF data showing the efficiency of the underlying algorithms.
A scalable method for mining graph patterns stable under subsampling is proposed. The existing subsample stability and robustness measures are not antimonotonic according to definitions known so far. We study a broader notion of antimonotonicity for graph patterns, so that measures of subsample stability become antimonotonic. Then we propose gSOFIA for mining the most subsample-stable graph patterns. The experiments on numerous graph datasets show that gSOFIA is very efficient for discovering subsample-stable graph patterns.
Pattern structures, an extension of FCA to data with complex descriptions, propose an alternative to conceptual scaling (binarization) by giving direct way to knowledge discovery in complex data such as logical formulas, graphs, strings, tuples of numerical intervals, etc. Whereas the approach to classification with pattern structures based on preceding generation of classifiers can lead to double exponent complexity, the combination of lazy evaluation with projection approximations of initial data, randomization and parallelization, results in reduction of algorithmic complexity to low degree polynomial, and thus is feasible for big data.
The Fifth HCT Information Technology Trends (ITT 2018) is a major international research conference for the presentation of innovative ideas, approaches, technologies, research findings and outcomes, best practices and case studies, national and international projects, institutional standards and policies on Emerging Technologies for Artificial Intelligence. ITT 2018 will provide an outstanding forum for researchers, practitioners, students, policy makers, and users to exchange ideas, techniques and tools, raise awareness and share experiences related to all practical and theoretical aspects of Emerging Technologies for Artificial Intelligence, so as to develop solutions related to communications, computer science and engineering, control systems as well as interdisciplinary research and applications.
This paper addresses the important problem of efficiently mining numerical data with formal concept analysis (FCA). Classically, the only way to apply FCA is to binarize the data, thanks to a so-called scaling procedure. This may either involve loss of information, or produce large and dense binary data known as hard to process. In the context of gene expression data analysis, we propose and compare two FCA-based methods for mining numerical data and we show that they are equivalent. The first one relies on a particular scaling, encoding all possible intervals of attribute values, and uses standard FCA techniques. The second one relies on pattern structures without a priori transformation, and is shown to be more computationally efficient and to provide more readable results. Experiments with real-world gene expression data are discussed and give a practical basis for the comparison and evaluation of the methods.
Nowadays data-sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of “complex” sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of formal concept analysis and its extension based on “pattern structures”. Pattern structures are used for mining complex data (such as sequences or graphs) and are based on a subsumption operation, which in our case is defined with respect to the partial order on sequences. We show how pattern structures along with projections (i.e. a data reduction of sequential structures) are able to enumerate more meaningful patterns and increase the computing efficiency of the approach. Finally, we show the applicability of the presented method for discovering and analysing interesting patient patterns from a French healthcare data-set on cancer. The quantitative and qualitative results (with annotations and analysis from a physician) are reported in this use-case which is the main motivation for this work.