Mining gene expression data with pattern structures in formal concept analysis
This paper addresses the important problem of efficiently mining numerical data with formal concept analysis (FCA). Classically, the only way to apply FCA is to binarize the data, thanks to a so-called scaling procedure. This may either involve loss of information, or produce large and dense binary data known as hard to process. In the context of gene expression data analysis, we propose and compare two FCA-based methods for mining numerical data and we show that they are equivalent. The first one relies on a particular scaling, encoding all possible intervals of attribute values, and uses standard FCA techniques. The second one relies on pattern structures without a priori transformation, and is shown to be more computationally efficient and to provide more readable results. Experiments with real-world gene expression data are discussed and give a practical basis for the comparison and evaluation of the methods.
Many environmental stimuli present a quasi-rhythmic structure at different timescales that the brain needs to decompose and integrate. Cortical oscillations have been proposed as instruments of sensory de-multiplexing, i.e., the parallel processing of different frequency streams in sensory signals. Yet their causal role in such a process has never been demonstrated. Here, we used a neural microcircuit model to address whether coupled theta–gamma oscillations, as observed in human auditory cortex, could underpin the multiscale sensory analysis of speech. We show that, in continuous speech, theta oscillations can flexibly track the syllabic rhythm and temporally organize the phoneme-level response of gamma neurons into a code that enables syllable identification. The tracking of slow speech fluctuations by theta oscillations, and its coupling to gamma-spiking activity both appeared as critical features for accurate speech encoding. These results demonstrate that cortical oscillations can be a key instrument of speech de-multiplexing, parsing, and encoding.
Various indices and ratings describing democratic processes in countries around the world have been developed by international organizations (such as Freedom House) and analytical centers (such as the one afﬁliated with the journal Economist). The main drawback of such ratings is that they only provide a linear ordering of countries by averaging a multitude of criteria. Such approach does not make it obvious which particular problems exist in which countries and thus does not help comparing democratic processes in different countries. In this paper, we propose a multidimensional model for ratings based on the mathematical discipline of formal concept analysis, which deals, in particular, with automated taxonomy construction from object–attribute data. In our case, every node of a taxonomy would group countries similar in certain aspects, while at the same time providing a description of these aspects. The aim is not to question the existing ratings, but rather to provide a neutral instrument for uncovering the structure of the data underlying these ratings. The proposed representation is much more informative than linear ratings, since it shows the commonalities and differences in the democratic development of various countries. In addition, it provides a solid ground for discussing, comparing, and criticizing ratings. It can also help formulate theoretical hypotheses on the evolution of democracy, thereby advancing scientiﬁc discovery. We illustrate the proposed representation with the case study of countries in Central and Eastern Europe and the former Soviet Union.
A novel approach to triclustering of a three-way binary data is proposed. Tricluster is defined in terms of Triadic Formal Concept Analysis as a dense triset of a binary relation Y , describing relationship between objects, attributes and conditions. This definition is a relaxation of a triconcept notion and makes it possible to find all triclusters and triconcepts contained in triclusters of large datasets. This approach generalizes the similar study of concept-based biclustering.
This book constitutes the proceedings of the 14th International Conference on Formal Concept Analysis, ICFCA 2017, held in Rennes, France, in June 2017. The 13 full papers presented in this volume were carefully reviewed and selected from 37 submissions. The book also contains an invited contribution and a historical paper translated from German and originally published in “Die Klassifkation und ihr Umfeld”, edited by P. O. Degens, H. J. Hermes, and O. Opitz, Indeks-Verlag, Frankfurt, 1986. The field of Formal Concept Analysis (FCA) originated in the 1980s in Darmstadt as a subfield of mathematical order theory, with prior developments in other research groups. Its original motivation was to consider complete lattices as lattices of concepts, drawing motivation from philosophy and mathematics alike. FCA has since then developed into a wide research area with applications much beyond its original motivation, for example in logic, data mining, learning, and psychology.
A scalable method for mining graph patterns stable under subsampling is proposed. The existing subsample stability and robustness measures are not antimonotonic according to definitions known so far. We study a broader notion of antimonotonicity for graph patterns, so that measures of subsample stability become antimonotonic. Then we propose gSOFIA for mining the most subsample-stable graph patterns. The experiments on numerous graph datasets show that gSOFIA is very efficient for discovering subsample-stable graph patterns.
The problem of detecting terms that can be interesting to the advertiser is considered. If a company has already bought some advertising terms which describe certain services, it is reasonable to find out the terms bought by competing companies. A part of them can be recommended as future advertising terms to the company. The goal of this work is to propose better interpretable recommendations based on FCA and association rules.
The paper is the preface to the special issue of the Fundamenta Informaticae journal on concept lattices and their applications. It is focused on recent developments in Formal Concept Analysis (FCA), as well as on applications in closely related areas such as data mining, information retrieval, knowledge management, data and knowledge engineering, and lattice theory.
Formal Concept Analysis (FCA) is a mathematical technique that has been extensively applied to Boolean data in knowledge discovery, information retrieval, web mining, etc. applications. During the past years, the research on extending FCA theory to cope with imprecise and incomplete information made significant progress. In this paper, we give a systematic overview of the more than 120 papers published between 2003 and 2011 on FCA with fuzzy attributes and rough FCA. We applied traditional FCA as a text-mining instrument to 1072 papers mentioning FCA in the abstract. These papers were formatted in pdf files and using a thesaurus with terms referring to research topics, we transformed them into concept lattices. These lattices were used to analyze and explore the most prominent research topics within the FCA with fuzzy attributes and rough FCA research communities. FCA turned out to be an ideal metatechnique for representing large volumes of unstructured texts.
One of the key advances in genome assembly that has led to a significant improvement in contig lengths has been improved algorithms for utilization of paired reads (mate-pairs). While in most assemblers, mate-pair information is used in a post-processing step, the recently proposed Paired de Bruijn Graph (PDBG) approach incorporates the mate-pair information directly in the assembly graph structure. However, the PDBG approach faces difficulties when the variation in the insert sizes is high. To address this problem, we first transform mate-pairs into edge-pair histograms that allow one to better estimate the distance between edges in the assembly graph that represent regions linked by multiple mate-pairs. Further, we combine the ideas of mate-pair transformation and PDBGs to construct new data structures for genome assembly: pathsets and pathset graphs.
This volume presents new results in the study and optimization of information transmission models in telecommunication networks using different approaches, mainly based on theiries of queueing systems and queueing networks .
The paper provides a number of proposed draft operational guidelines for technology measurement and includes a number of tentative technology definitions to be used for statistical purposes, principles for identification and classification of potentially growing technology areas, suggestions on the survey strategies and indicators. These are the key components of an internationally harmonized framework for collecting and interpreting technology data that would need to be further developed through a broader consultation process. A summary of definitions of technology already available in OECD manuals and the stocktaking results are provided in the Annex section.