### Article

## Manifold Learning in Regression Tasks

The paper presents a new geometrically motivated method for non-linear regression based on Manifold learning technique. The regression problem is to construct a predictive function which estimates an unknown smooth mapping f from q-dimensional inputs to m-dimensional outputs based on a training data set consisting of given ‘input-output’ pairs. The unknown mapping f determines q-dimensional manifold M(f) consisting of all the ‘input-output’ vectors which is embedded in (q+m)-dimensional space and covered by a single chart; the training data set determines a sample from this manifold. Modern Manifold Learning methods allow constructing the certain estimator M* from the manifold-valued sample which accurately approximates the manifold. The proposed method called Manifold Learning Regression (MLR) finds the predictive function fMLR to ensure an equality M(fMLR) = M*. The MLR simultaneously estimates the m×q Jacobian matrix of the mapping f.

This volume contains the extended version of selected talks given at the international research workshop "Coping with Complexity: Model Reduction and Data Analysis", Ambleside, UK, August 31 – September 4, 2009. The book is deliberately broad in scope and aims at promoting new ideas and methodological perspectives. The topics of the chapters range from theoretical analysis of complex and multiscale mathematical models to applications in e.g., fluid dynamics and chemical kinetics.

Neuronal oscillations have been shown to be associated with perceptual, motor and cognitive brain operations. While complex spatio-temporal dynamics are a hallmark of neuronal oscillations, they also represent a formidable challenge for the proper extraction and quantification of oscillatory activity with non-invasive recording techniques such as EEG and MEG. In order to facilitate the study of neuronal oscillations we present a general-purpose pre-processing approach, which can be applied for a wide range of analyses including but not restricted to inverse modeling and multivariate single-trial classification. The idea is to use dimensionality reduction with spatio-spectral decomposition (SSD) instead of the commonly and almost exclusively used principal component analysis (PCA). The key advantage of SSD lies in selecting components explaining oscillations-related variance instead of just any variance as in the case of PCA. For the validation of SSD pre-processing we performed extensive simulations with different inverse modeling algorithms and signal-to-noise ratios. In all these simulations SSD invariably outperformed PCA often by a large margin. Moreover, using a database of multichannel EEG recordings from 80 subjects we show that pre-processing with SSD significantly increases the performance of single-trial classification of imagined movements, compared to the classification with PCA pre-processing or without any dimensionality reduction. Our simulations and analysis of real EEG experiments show that, while not being supervised, the SSD algorithm is capable of extracting components primarily relating to the signal of interest often using as little as 20% of the data variance, instead of > 90% variance as in case of PCA. Given its ease of use, absence of supervision, and capability to efficiently reduce the dimensionality of multivariate EEG/MEG data, we advocate the application of SSD pre-processing for the analysis of spontaneous and induced neuronal oscillations in normal subjects and patients.

We propose a novel multi-texture synthesis model based on generative adversarial networks (GANs) with a user-controllable mechanism. The user control ability allows to explicitly specify the texture which should be generated by the model. This property follows from using an encoder part which learns a latent representation for each texture from the dataset. To ensure a dataset coverage, we use an adversarial loss function that penalizes for incorrect reproductions of a given texture. In experiments, we show that our model can learn descriptive texture manifolds for large datasets and from raw data such as a collection of high-resolution photos. We show our unsupervised learning pipeline may help segmentation models. Moreover, we apply our method to produce 3D textures and show that it outperforms existing baselines.

In many Data Analysis tasks, one deals with data that are presented in high-dimensional spaces. In practice original high-dimensional data are transformed into lower-dimensional representations (features) preserving certain subject-driven data properties such as distances or geodesic distances, angles, etc. Preserving as much as possible available information contained in the original high-dimensional data is also an important and desirable property of the representation. The real-world high-dimensional data typically lie on or near a certain unknown low-dimensional manifold (Data manifold) embedded in an ambient high-dimensional `observation' space, so in this article we assume this Manifold assumption to be fulfilled. An exact isometric manifold embedding in a low-dimensional space is possible in certain special cases only, so we consider the problem of constructing a `locally isometric and conformal' embedding, which preserves distances and angles between close points. We propose a new geometrically motivated locally isometric and conformal representation method, which employs Tangent Manifold Learning technique consisting in sample-based estimation of tangent spaces to the unknown Data manifold. In numerical experiments, the proposed method compares favourably with popular Manifold Learning methods in terms of isometric and conformal embedding properties as well as of accuracy of Data manifold reconstruction from the sample.

One of the ultimate goals of Manifold Learning (ML) is to reconstruct an unknown nonlinear low-dimensional Data Manifold (DM) embedded in a high-dimensional observation space from a given set of data points sampled from the manifold. We derive asymptotic expansion and local lower and upper bounds for the maximum reconstruction error in a small neighborhood of an arbitrary point. The expansion and bounds are defined in terms of the distance between tangent spaces to the original Data manifold and the Reconstructed Manifold (RM) at the selected point and its reconstructed value, respectively. We propose an amplification of the ML, called Tangent Bundle ML, in which proximity is required not only between the DM and RM but also between their tangent spaces. We present a new geometrically motivated Grassman&Stiefel Eigenmaps algorithm that solves this problem and gives a new solution for the ML also.

A model for organizing cargo transportation between two node stations connected by a railway line which contains a certain number of intermediate stations is considered. The movement of cargo is in one direction. Such a situation may occur, for example, if one of the node stations is located in a region which produce raw material for manufacturing industry located in another region, and there is another node station. The organization of freight traﬃc is performed by means of a number of technologies. These technologies determine the rules for taking on cargo at the initial node station, the rules of interaction between neighboring stations, as well as the rule of distribution of cargo to the ﬁnal node stations. The process of cargo transportation is followed by the set rule of control. For such a model, one must determine possible modes of cargo transportation and describe their properties. This model is described by a ﬁnite-dimensional system of diﬀerential equations with nonlocal linear restrictions. The class of the solution satisfying nonlocal linear restrictions is extremely narrow. It results in the need for the “correct” extension of solutions of a system of diﬀerential equations to a class of quasi-solutions having the distinctive feature of gaps in a countable number of points. It was possible numerically using the Runge–Kutta method of the fourth order to build these quasi-solutions and determine their rate of growth. Let us note that in the technical plan the main complexity consisted in obtaining quasi-solutions satisfying the nonlocal linear restrictions. Furthermore, we investigated the dependence of quasi-solutions and, in particular, sizes of gaps (jumps) of solutions on a number of parameters of the model characterizing a rule of control, technologies for transportation of cargo and intensity of giving of cargo on a node station.