Методы детерминированных и рандомизированных энтропийных проекций для редукции размерности матрицы данных
The work is devoted to development of methods for deterministic and randomized projection aimed at dimensionality reduction problems. In the deterministic case, the authors develop the parallel reduction procedure minimizing Kullback-Leibler cross-entropy target to condition on information capacity based on the gradient projection method. In the randomized case, the authors solve the problem of reduction of feature space. The idea of application of projection procedures for reduction of data matrix is implemented in the proposed method of randomized entropy projection where the authors use the principle of keeping average distances between high- and low-dimensional points in the corresponding spaces. The problem leads to searching of a probability distribution maximizing Fermi entropy target to average distance between points.
Dimensionality reduction problem is stated as finding a mapping f:X ∈ R m → Z ∈ R n , where ≪ m while preserving some relevant properties of the data. We formulate topology-preserving dimensionality reduction as finding the optimal orthogonal projection to the lower-dimensional subspace which minimizes discrepancy between persistent diagrams of the original data and the projection. This generalizes the classic projection pursuit algorithm which was originally designed to preserve the number of clusters, i.e. the 0-order topological invariant of the data. Our approach further allows to preserve k-th order invariants within the principled framework. We further pose the resulting optimization problem as the Riemannian optimization problem which allows for a natural and efficient solution.
Many Data Mining tasks deal with data which are presented in high dimensional spaces, and the ‘curse of dimensionality’ phenomena is often an obstacle to the use of many methods for solving these tasks. To avoid these phenomena, various Representation learning algorithms are used as a first key step in solutions of these tasks to transform the original high-dimensional data into their lower-dimensional representations so that as much information about the original data required for the considered Data Mining task is preserved as possible. The above Representation learning problems are formulated as various Dimensionality Reduction problems (Sample Embedding, Data Manifold embedding, Manifold Learning and newly proposed Tangent Bundle Manifold Learning) which are motivated by various Data Mining tasks. A new geometrically motivated algorithm that solves the Tangent Bundle Manifold Learning and gives new solutions for all the considered Dimensionality Reduction problems is presented.
This volume contains the extended version of selected talks given at the international research workshop "Coping with Complexity: Model Reduction and Data Analysis", Ambleside, UK, August 31 – September 4, 2009. The book is deliberately broad in scope and aims at promoting new ideas and methodological perspectives. The topics of the chapters range from theoretical analysis of complex and multiscale mathematical models to applications in e.g., fluid dynamics and chemical kinetics.
The variance and semivariance are traditional measures of asset returns volatility since Markowitz proposed the market portfolio theory. Well known models for expected asset returns were developed under assumptions of mean-variance or mean-semivariance investor’s behavior. But numerous papers provided arguments against these models because of unrealistic assumptions and controversial empiric evidence. More complicated models with downside risk measures experienced difficulties with applications. The new model based on the special form of the investor’s utility function is proposed in this paper.
Neuronal oscillations have been shown to be associated with perceptual, motor and cognitive brain operations. While complex spatio-temporal dynamics are a hallmark of neuronal oscillations, they also represent a formidable challenge for the proper extraction and quantification of oscillatory activity with non-invasive recording techniques such as EEG and MEG. In order to facilitate the study of neuronal oscillations we present a general-purpose pre-processing approach, which can be applied for a wide range of analyses including but not restricted to inverse modeling and multivariate single-trial classification. The idea is to use dimensionality reduction with spatio-spectral decomposition (SSD) instead of the commonly and almost exclusively used principal component analysis (PCA). The key advantage of SSD lies in selecting components explaining oscillations-related variance instead of just any variance as in the case of PCA. For the validation of SSD pre-processing we performed extensive simulations with different inverse modeling algorithms and signal-to-noise ratios. In all these simulations SSD invariably outperformed PCA often by a large margin. Moreover, using a database of multichannel EEG recordings from 80 subjects we show that pre-processing with SSD significantly increases the performance of single-trial classification of imagined movements, compared to the classification with PCA pre-processing or without any dimensionality reduction. Our simulations and analysis of real EEG experiments show that, while not being supervised, the SSD algorithm is capable of extracting components primarily relating to the signal of interest often using as little as 20% of the data variance, instead of > 90% variance as in case of PCA. Given its ease of use, absence of supervision, and capability to efficiently reduce the dimensionality of multivariate EEG/MEG data, we advocate the application of SSD pre-processing for the analysis of spontaneous and induced neuronal oscillations in normal subjects and patients.
A model for organizing cargo transportation between two node stations connected by a railway line which contains a certain number of intermediate stations is considered. The movement of cargo is in one direction. Such a situation may occur, for example, if one of the node stations is located in a region which produce raw material for manufacturing industry located in another region, and there is another node station. The organization of freight traﬃc is performed by means of a number of technologies. These technologies determine the rules for taking on cargo at the initial node station, the rules of interaction between neighboring stations, as well as the rule of distribution of cargo to the ﬁnal node stations. The process of cargo transportation is followed by the set rule of control. For such a model, one must determine possible modes of cargo transportation and describe their properties. This model is described by a ﬁnite-dimensional system of diﬀerential equations with nonlocal linear restrictions. The class of the solution satisfying nonlocal linear restrictions is extremely narrow. It results in the need for the “correct” extension of solutions of a system of diﬀerential equations to a class of quasi-solutions having the distinctive feature of gaps in a countable number of points. It was possible numerically using the Runge–Kutta method of the fourth order to build these quasi-solutions and determine their rate of growth. Let us note that in the technical plan the main complexity consisted in obtaining quasi-solutions satisfying the nonlocal linear restrictions. Furthermore, we investigated the dependence of quasi-solutions and, in particular, sizes of gaps (jumps) of solutions on a number of parameters of the model characterizing a rule of control, technologies for transportation of cargo and intensity of giving of cargo on a node station.