Biclustering numerical data became a popular data-mining task at the beginning of 2000’s, especially for gene expression data analysis and recommender systems. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So-called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address a complete, correct and non-redundant enumeration of such patterns, a well-known intractable problem, while no formal framework exists. We introduce important links between biclustering and Formal Concept Analysis (FCA). Indeed, FCA is known to be, among others, a methodology for biclustering binary data. Handling numerical data is not direct, and we argue that Triadic Concept Analysis (TCA), the extension of FCA to ternary relations, provides a powerful mathematical and algorithmic framework for biclustering numerical data. We discuss hence both theoretical and computational aspects on biclustering numerical data with triadic concept analysis. These results also scale to n-dimensional numerical datasets.
Global sensitivity analysis aims at quantifying respective effects of input random variables (or combinations thereof) onto variance of a physical or mathematical model response. Among the abundant literature on sensitivity measures, Sobol indices have received much attention since they provide accurate information for most of models. We consider a problem of experimental design points selection for Sobol’ indices estimation. Based on the concept of D-optimality, we propose a method for constructing an adaptive design of experiments, effective for calculation of Sobol’ indices based on Polynomial Chaos Expansions. We provide a set of applications that demonstrate the efficiency of the proposed approach.
Engineers widely use Gaussian process regression framework to construct surrogate models aimed to replace computationally expensive physical models while exploring design space. Thanks to Gaussian process properties we can use both samples generated by a high fidelity function (an expensive and accurate representation of a physical phenomenon) and a low fidelity function (a cheap and coarse approximation of the same physical phenomenon) while constructing a surrogate model. However, if samples sizes are more than few thousands of points, computational costs of the Gaussian process regression become prohibitive both in case of learning and in case of prediction calculation. We propose two approaches to circumvent this computational burden: one approach is based on the Nyström approximation of sample covariance matrices and another is based on an intelligent usage of a blackbox that can evaluate a low fidelity function on the fly at any point of a design space. We examine performance of the proposed approaches using a number of artificial and real problems, including engineering optimization of a rotating disk shape.
Many real applications require the representation of complex entities and their relations. Frequently, networks are the chosen data structures, due to their ability to highlight topological and qualitative characteristics. In this work, we are interested in supervised classification models for data in the form of networks. Given two or more classes whose members are networks, we build mathematical models to classify them, based on various graph distances. Due to the complexity of the models, made of tens of thousands of nodes and edges, we focus on model simplification solutions to reduce execution times, still maintaining high accuracy. Experimental results on three datasets of biological interest show the achieved performance improvements.
The main goal of the present paper is the development of a general framework of multivariate network analysis of statistical data sets. A general method of multivariate network construction, on the basis of measures of association, is proposed. In this paper we consider Pearson correlation network, sign similarity network, Fechner correlation network, Kruskal correlation network, Kendall correlation network, and the Spearman correlation network. The problem of identification of the threshold graph in these networks is discussed. Different multiple decision statistical procedures are proposed. It is shown that a statistical procedure used for threshold graph identification in one network can be efficiently used for any other network. Our approach allows us to obtain statistical procedures with desired properties for any network. © 2015 Springer International Publishing Switzerland.
The problem of stock selection is disscused from different points of view. Three different sequentially rejective statistical procedures for stock selection are described and compared: Holm multiple test procedure, maximin multiple test procedure and multiple decision procedure. Properties of statistical procedures are studied for different loss functions. It is shown that conditional risk for additive loss function essentially depend from correlation matrix for maximin procedure and does not depend for multiple decision procedure. The dependence on correlation matrix is different for 0-1(zero-one) loss functions. Dependence of error probability and conditional risk on the selection threshold is studied as well.
In this paper, we consider algorithms involved in the computation of the Duquenne–Guigues basis of implications. The most widely used algorithm for constructing the basis is Ganter’s Next Closure, designed for generating closed sets of an arbitrary closure system. We show that, for the purpose of generating the basis, the algorithm can be optimized. We compare the performance of the original algorithm and its optimized version in a series of experiments using artificially generated and real-life datasets. An important computationally expensive subroutine of the algorithm generates the closure of an attribute set with respect to a set of implications. We compare the performance of three algorithms for this task on their own, as well as in conjunction with each of the two algorithms for generating the basis. We also discuss other approaches to constructing the Duquenne–Guigues basis.
The present special issue contains extended versions of selected papers that were presented at CLA 2008, the Sixth International Conference on Concept Lattices and Their Applications. CLA 2008 was held in Olomouc, Czech Republic, from October 21 to October 23, 2008, and was organized jointly by the Palacky University, Olomouc, and the State University of New York at Binghamton. CLA is an international conference dedicated to formal concept analysis (FCA) and areas closely related to FCA such as data mining, information retrieval, knowledge management, data and knowledge engineering, logic, algebra and lattice theory. In particular, the areas of interest to CLA include foundations of FCA, concept lattices and related structures, attribute implications, association rules and other data dependencies, algorithms, visualization, data preprocessing, redundancy and dimensionality reduction, classification and clustering, information retrieval, ontologies, and applications of FCA. The program of CLA 2008 consisted of presentations of regular papers and posters, and three invited talks. It is a tradition of CLA that the program chairs
organize a special issue. It is our pleasure that Professor Golumbic, the Editor in Chief of the Annals of Mathematics and Artificial Intelligence, accepted our proposal to organize such special issue. We therefore invited the authors of the best CLA 2008 papers to submit extended versions of their papers to this special issue. Each submitted paper was reviewed by two to three reviewers who are renowned experts in the field. According to the journal policy, the paper co-authored by Radim Belohlavek was handled by the Editor in Chief. Based on the reviewersђ reports, eight papers were selected. We are pleased to present these papers in this special issue.