### Book chapter

## Asymptotically Optimal Method for Manifold Estimation Problem

Let X be an unknown nonlinear smooth q-dimensional Data manifold (D-manifold) embedded in a p-dimensional space (p> q) covered by a single coordinate chart. It is assumed that the manifold's condition number is positive so X has no self-intersections. Let Xn={X1, X2,..., Xn}⊂ X⊂ Rp be a sample randomly selected from the D-manifold Xindependently of each other according to an unknown probability measure on X with strictly positive density.

We discuss the general opportunity to create (asymptotically) a comletely integrable system from the original perturbed system by inserting additional perturbing terms. After such an artificial insertion, there appears an opportunity to make the secondary averaging and secondary reduction of the original system. Thus, by this way, the $3D$-system becomes $1$-dimensional. We demonstrate this approach by the example of a resonance Penning trap.

Let a high-dimensional random vector $\vX$ be represented as a sum of two components - a signal $\vS$ that belongs to some low-dimensional linear subspace $\S$, and a noise component $\vN$. This paper presents a new approach for estimating the subspace $\S$ based on the ideas of the Non-Gaussian Component Analysis. Our approach avoids the technical difficulties that usually appear in similar methods - it requires neither the estimation of the inverse covariance matrix of $\vX$ nor the estimation of the covariance matrix of $\vN.

One of the ultimate goals of Manifold Learning (ML) is to reconstruct an unknown nonlinear low-dimensional Data Manifold (DM) embedded in a high-dimensional observation space from a given set of data points sampled from the manifold. We derive asymptotic expansion and local lower and upper bounds for the maximum reconstruction error in a small neighborhood of an arbitrary point. The expansion and bounds are defined in terms of the distance between tangent spaces to the original Data manifold and the Reconstructed Manifold (RM) at the selected point and its reconstructed value, respectively. We propose an amplification of the ML, called Tangent Bundle ML, in which proximity is required not only between the DM and RM but also between their tangent spaces. We present a new geometrically motivated Grassman&Stiefel Eigenmaps algorithm that solves this problem and gives a new solution for the ML also.

In many Data Analysis tasks, one deals with data that are presented in high-dimensional spaces. In practice original high-dimensional data are transformed into lower-dimensional representations (features) preserving certain subject-driven data properties such as distances or geodesic distances, angles, etc. Preserving as much as possible available information contained in the original high-dimensional data is also an important and desirable property of the representation. The real-world high-dimensional data typically lie on or near a certain unknown low-dimensional manifold (Data manifold) embedded in an ambient high-dimensional `observation' space, so in this article we assume this Manifold assumption to be fulfilled. An exact isometric manifold embedding in a low-dimensional space is possible in certain special cases only, so we consider the problem of constructing a `locally isometric and conformal' embedding, which preserves distances and angles between close points. We propose a new geometrically motivated locally isometric and conformal representation method, which employs Tangent Manifold Learning technique consisting in sample-based estimation of tangent spaces to the unknown Data manifold. In numerical experiments, the proposed method compares favourably with popular Manifold Learning methods in terms of isometric and conformal embedding properties as well as of accuracy of Data manifold reconstruction from the sample.

Many Data Mining tasks deal with data which are presented in high dimensional spaces, and the ‘curse of dimensionality’ phenomena is often an obstacle to the use of many methods for solving these tasks. To avoid these phenomena, various Representation learning algorithms are used as a first key step in solutions of these tasks to transform the original high-dimensional data into their lower-dimensional representations so that as much information about the original data required for the considered Data Mining task is preserved as possible. The above Representation learning problems are formulated as various Dimensionality Reduction problems (Sample Embedding, Data Manifold embedding, Manifold Learning and newly proposed Tangent Bundle Manifold Learning) which are motivated by various Data Mining tasks. A new geometrically motivated algorithm that solves the Tangent Bundle Manifold Learning and gives new solutions for all the considered Dimensionality Reduction problems is presented.