Manifold Learning in Data Mining Tasks
Many Data Mining tasks deal with data which are presented in high dimensional spaces, and the ‘curse of dimensionality’ phenomena is often an obstacle to the use of many methods for solving these tasks. To avoid these phenomena, various Representation learning algorithms are used as a first key step in solutions of these tasks to transform the original high-dimensional data into their lower-dimensional representations so that as much information about the original data required for the considered Data Mining task is preserved as possible. The above Representation learning problems are formulated as various Dimensionality Reduction problems (Sample Embedding, Data Manifold embedding, Manifold Learning and newly proposed Tangent Bundle Manifold Learning) which are motivated by various Data Mining tasks. A new geometrically motivated algorithm that solves the Tangent Bundle Manifold Learning and gives new solutions for all the considered Dimensionality Reduction problems is presented.
We propose a novel multi-texture synthesis model based on generative adversarial networks (GANs) with a user-controllable mechanism. The user control ability allows to explicitly specify the texture which should be generated by the model. This property follows from using an encoder part which learns a latent representation for each texture from the dataset. To ensure a dataset coverage, we use an adversarial loss function that penalizes for incorrect reproductions of a given texture. In experiments, we show that our model can learn descriptive texture manifolds for large datasets and from raw data such as a collection of high-resolution photos. We show our unsupervised learning pipeline may help segmentation models. Moreover, we apply our method to produce 3D textures and show that it outperforms existing baselines.
Objects have a variety of different features that can be represented as probability distributions. Recent findings show that in addition to mean and variance, the visual system can also encode the shape of feature distributions for features like color or orientation. In an odd-one-out search task we investigated observers' ability to encode two feature distributions simultaneously. Our stimuli were defined by two distinct features (color and orientation) while only one was relevant to the search task. We investigated whether the irrelevant feature distribution influences learning of the task-relevant distribution and whether observers also encode the irrelevant distribution. Although considerable learning of feature distributions occurred, especially for color, our results also suggest that adding a second irrelevant feature distribution negatively affected the encoding of the relevant one and that little learning of the irrelevant distribution occurred. There was also an asymmetry between the two different features: Searching for the oddly oriented target was more difficult than searching for the oddly colored target, which was reflected in worse learning of the color distribution. Overall, the results demonstrate that it is possible to encode information about two feature distributions simultaneously but also reveal considerable limits to this encoding.
This paper considers a data analysis system for collaborative platforms which was developed by the joint research team of the National Research University Higher School of Economics and the Witology company. Our focus is on describing the methodology and results of the first experiments. The developed system is based on several modern models and methods for analysing of object-attribute and unstructured data (texts) such as Formal Concept Analysis, multimodal clustering, association rule mining, and keyword and collocation extraction from texts.
This volume contains the extended version of selected talks given at the international research workshop "Coping with Complexity: Model Reduction and Data Analysis", Ambleside, UK, August 31 – September 4, 2009. The book is deliberately broad in scope and aims at promoting new ideas and methodological perspectives. The topics of the chapters range from theoretical analysis of complex and multiscale mathematical models to applications in e.g., fluid dynamics and chemical kinetics.