Manifold Learning in Data Mining Tasks

A. P. Kuleshov; A. Bernstein

?

Manifold Learning in Data Mining Tasks

P. 119-133.

Kuleshov A. P., Bernstein A.

Many Data Mining tasks deal with data which are presented in high dimensional spaces, and the ‘curse of dimensionality’ phenomena is often an obstacle to the use of many methods for solving these tasks. To avoid these phenomena, various Representation learning algorithms are used as a first key step in solutions of these tasks to transform the original high-dimensional data into their lower-dimensional representations so that as much information about the original data required for the considered Data Mining task is preserved as possible. The above Representation learning problems are formulated as various Dimensionality Reduction problems (Sample Embedding, Data Manifold embedding, Manifold Learning and newly proposed Tangent Bundle Manifold Learning) which are motivated by various Data Mining tasks. A new geometrically motivated algorithm that solves the Tangent Bundle Manifold Learning and gives new solutions for all the considered Dimensionality Reduction problems is presented.

Language: English

Text on another site

Keywords: data mining dimensionality reduction Statistical Learning Representation learning Manifold Learning Tangent Learning Tangent Bundle Manifold Learning

In book

Machine Learning and Data Mining in Pattern Recognition

Vol. 8556. , Springer, 2014

Manifold Learning in Regression Tasks

Bernstein A., Kuleshov A. P., Yanovich Y., Lecture Notes in Computer Science 2015 Vol. 9047 P. 414-423

The paper presents a new geometrically motivated method for non-linear regression based on Manifold learning technique. The regression problem is to construct a predictive function which estimates an unknown smooth mapping f from q-dimensional inputs to m-dimensional outputs based on a training data set consisting of given ‘input-output’ pairs. The unknown mapping f determines q-dimensional ...

Added: August 30, 2015

Manifold Learning: Generalization Ability and Tangent Proximity

Bernstein A., Kuleshov A. P., International Journal of Software and Informatics 2013 No. 7(3) P. 359-390

One of the ultimate goals of Manifold Learning (ML) is to reconstruct an unknown nonlinear low-dimensional Data Manifold (DM) embedded in a high-dimensional observation space from a given set of data points sampled from the manifold. We derive asymptotic expansion and local lower and upper bounds for the maximum reconstruction error in a small neighborhood ...

Added: November 21, 2014

Undecidability of the Lambek calculus with a relevant modality

Kanovich M., Scedrov A., Kuznetsov S., , in : The 21st Conference on Formal Grammar. : Springer, 2016. P. 240-256.

Morrill and Valent´ın in the paper “Computational coverage of TLG: Nonlinearity” considered an extension of the Lambek calculus enriched by a so-called “exponential” modality. This modality behaves in the “relevant” style, that is, it allows contraction and permutation, but not weakening. Morrill and Valent´ın stated an open problem whether this system is decidable. Here we ...

Added: June 28, 2016

2019 International Conference on Data Mining Workshops (ICDMW)

IEEE, 2019

Added: October 18, 2021

Proceedings of the 7th Spring/Summer Young Researchers’ Colloquium on Software Engineering, SYRCoSE 2013

Kazan : -, 2013

The issue contains the papers presented at the 7th Spring/Summer Young Researchers' Соllоquium оn Software Engineering (SYRCoSE 2013) held in Kazan, Russia on 30th and З1st оf Мay, 2013. Paper selection was based on a competitive peer review process being done by the program committee. Both regular and reseаrсh-in-рrogrеss papers were соnsidered ассeрtable for the ...

Added: June 8, 2013

Assessment of Dendritic Cell Therapy Effectiveness Based on the Feature Extraction from Scientific Publications

Luparov A., Panov A. I., Suvorov R. et al., , in : Proceedings of ICPRAM 2015 - 4th International Conference on Pattern Recognition Applications and Methods. Vol. 2.: SciTePress, 2015. P. 270-276.

Dendritic cells (DCs) vaccination is a promising way to contend cancer metastases especially in the case of immunogenic tumors. Unfortunately, it is only rarely possible to achieve a satisfactory clinical outcome in the majority of patients treated with a particular DC vaccine. Apparently, DC vaccination can be successful with certain combinations of features of the ...

Added: November 20, 2015

Метод анализа многомерных временных рядов с использованием корректировки предварительно рассчитанной обратной матрицы: исследование в сравнении с другими методами Data Mining

Perminov G. I., Бизнес-информатика 2008 № 1 С. 36-44

В ходе анализа многомерных временных рядов применение традиционных статистических методов определяется соблюдением достаточно строгих предпосылок, позволяющих использовать лежащий в основе этих методов МНК. К ним относятся: отсутствие мультиколлинеарности, гетероскедастичности и автокорреляции. В задачах экономического анализа и многомерного прогнозирования с целью уменьшения числа рассматриваемых переменных и быстрого получения приблизительных закономерностей целесообразно прибегнуть к методам интеллектуального анализа ...

Added: September 28, 2012

Formal Concept Analysis: 16th International Conference, ICFCA 2021, Strasbourg, France, June 29 – July 2, 2021, Proceedings

Springer, 2021

This book constitutes the proceedings of the 16th International Conference on Formal Concept Analysis, ICFCA 2021, held in Strasbourg, France, in June/July 2021. The 14 full papers and 5 short papers presented in this volume were carefully reviewed and selected from 32 submissions. The book also contains four invited contributions in full paper length. The research part ...

Added: July 10, 2021

Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data

Ignatov D. I., Khvorykh G., Khrunin A. et al., , in : Recent Trends in Analysis of Images, Social Networks and Texts. 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020 Revised Supplementary Proceedings. Vol. 12602.: Springer, 2021. P. 185-204.

© 2021, Springer Nature Switzerland AG.Missing genotypes can affect the efficacy of machine learning approaches to identify the risk genetic variants of common diseases and traits. The problem occurs when genotypic data are collected from different experiments with different DNA microarrays, each being characterised by its pattern of uncalled (missing) genotypes. This can prevent the ...

Added: November 1, 2022

Сокращение размерности данных в задачах имитационного моделирования

Агалаков Ю. Г., Bernstein A., Информационные технологии и вычислительные системы 2012 № 3 С. 3-17

Рассматриваются задачи интеллектуального анализа данных, которые необходимо решать в технологии предсказательного моделирования. Для уменьшения сложности решения этих задач в технологии предсказательного моделирования используются решения задач снижения размерности, которые должны удовлетворять ряду дополнительных условий. В статье обсуждаются эти дополнительные требования и сформулированы соответствующие новые нетрадиционные постановки задач снижения размерности. ...

Added: January 24, 2013

FCA-Based Recommender Models and Data Analysis for Crowdsourcing Platform Witology

Ignatov D. I., Kaminskaya A. Y., Malioukov A. et al., , in : Proceedings of International Conference on Conceptual Structures 2014. Vol. 8577: Graph-Based Representation and Reasoning.: Springer, 2014. P. 287-292.

This paper considers a recommender part of the data anal- ysis system for the collaborative platform Witology. It was developed by the joint research team of the National Research University Higher School of Economics and the Witology company. This recommender sys- tem is able to recommend ideas, like-minded users and antagonists at the respective phases ...

Added: June 9, 2014

Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing

Heidelberg : Springer, 2013

This paper comprises papers accepted for presentation at the 14th Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing (RSFDGRC) International Conference which was held as a major part of Joint Rough Set Symposium (JRS 2013) held at Halifax Canada during October 11-14, 2013. ...

Added: October 29, 2013

Analysis of Images, Social Networks and Texts Third International Conference, AIST 2014, Yekaterinburg, Russia, April 10-12, 2014, Revised Selected Papers

Berlin : Springer, 2014

This book constitutes the proceedings of the Third International Conference on Analysis of Images, Social Networks and Texts, AIST 2014, held in Yekaterinburg, Russia, in April 2014. The 11 full and 10 short papers were carefully reviewed and selected from 74 submissions. They are presented together with 3 short industrial papers, 4 invited papers and ...

Added: November 13, 2014

Ответ на рецензию Ю.Ю. Петрунина «Астрология, нейронные сети и управление персоналом»

Yasnitsky L., Журнал формирующихся направлений науки 2015 Т. 3 № 7

The article presents selected excerpts of the debate, which the doctor of philosophical Sciences, Professor of Moscow state University Yu. Yu. Petrunin. ...

Added: February 23, 2016

Computer-Support Capabilities for Qualitative Research in Sociology

Mikheyenkova M., Automatic Documentation and Mathematical Linguistics 2011 Vol. 45 No. 4 P. 180-201

The process of development of approaches to the qualitative analysis of sociological data from a qualitative analysis of the use of computer tools is reviewed in this paper. Its development means a transfer from simple computer processing of data to modern intelligent data analysis ...

Added: September 28, 2013

User-controllable Multi-texture Synthesis with Generative Adversarial Networks

Alanov A., Kochurov M., Volkhonskiy D. et al., , in : Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP 2020). Vol. 4.: SciTePress, 2020. P. 214-221.

We propose a novel multi-texture synthesis model based on generative adversarial networks (GANs) with a user-controllable mechanism. The user control ability allows to explicitly specify the texture which should be generated by the model. This property follows from using an encoder part which learns a latent representation for each texture from the dataset. To ensure ...

Added: November 8, 2020

FCA-Based Models and a Prototype Data Analysis System for Crowdsourcing Platforms

Ignatov D. I., Kaminskaya A. Y., Konstantinov A. V. et al., , in : Conceptual Structures for STEM Research and Education, 20th International Conference on Conceptual Structures. Vol. 7735: Conceptual Structures for STEM Research and Education, 20th International Conference on Conceptual Structures.: Berlin, Heidelberg : Springer, 2013. P. 173-192.

This paper considers a data analysis system for collaborative platforms which was developed by the joint research team of the National Research University Higher School of Economics and the Witology company. Our focus is on describing the methodology and results of the first experiments. The developed system is based on several modern models and methods ...

Added: October 10, 2013

Application of Modern Data Analysis Methods to Cluster the Clinical Pathways in Urban Medical Facilities

Prokofyeva E. S., Zaitsev R., Maltseva S. V., , in : 2019 IEEE 21st Conference on Business Informatics (CBI). Vol. 1.: M. : IEEE Computer Society, 2019. P. 75-83.

Patient flow modeling in healthcare plays a large role in understanding the operation of the system and its characteristics. Besides, modeling techniques can significantly improve the effectiveness of the medical facilities. The existing level of automation in these facilities enables the accumulation of large amounts of various data. Therefore, the collected data might be considered ...

Added: September 10, 2019

Communications in Computer and Information Science, Vol. 542, Springer, 2015

Semenov A., Natekin A., Nikolenko S. I. et al., Springer, 2015

In online social networks, high level features of user behavior such as character traits can be predicted with data from user profiles and their connections. Recent publications use data from online social networks to detect people with depression propensity and diagnosis. In this study, we investigate the capabilities of previously published methods and metrics applied to the Russian online social ...

Added: December 21, 2015

Creating a Course Recommendation System for Exchange Students

Suschevskiy V., Mohammad K., , in : Companion Proceedings 11th International Conference on Learning Analytics & Knowledge (LAK21). : [б.и.], 2021. P. 76-78.

While the exchange of cross-border students in Europe has increased significantly in recent years, a growing number of these students face obstacles in selecting courses for exchange. This poster describes the first iteration of creating a course recommendation system for exchange students to select courses that fit their preferences. We implemented a combination of embedding ...

Added: July 4, 2021

Discovering structural alerts for mutagenicity using stable emerging molecular patterns

Metivier J. -., Lepailleur A., Buzmakov A. V. et al., Journal of Chemical Information and Modeling 2015 Vol. 55 No. 5 P. 925-940

This study is dedicated to the introduction of a novel method that automatically extracts potential structural alerts from a data set of molecules. These triggering structures can be further used for knowledge discovery and classification purposes. Computation of the structural alerts results from an implementation of a sophisticated workflow that integrates a graph mining tool ...

Added: September 3, 2015

Состоятельность оценки области определения алгоритмом спектральных вложений Грассмана-Штифеля

Yanovich Y., В кн. : Сборник статей конференции "Информационные технологии и системы" (ИТиС'16). : М. : ИППИ РАН, 2016. С. 191-197.

В машинном обучении при построении регрессионных зависимостей или решении задач классификации многомерные описания объектов часто являются избыточными и функционально зависимыми. Такие описания зачастую лежат около многообразий существенно меньшей размерности, чем размерность их первичной записи. Данное предположение называется гипотезой многообразия (Manifold Hypothesis). Использование такой информации может помочь в решении исходной задачи. Так возникает задача оценивания многообразий. ...

Added: November 24, 2016

Representing color and orientation ensembles: Can observers learn multiple feature distributions?

Kristjansson A., Journal of Vision 2019 Vol. 9 No. 9 P. 1-17

Objects have a variety of different features that can be represented as probability distributions. Recent findings show that in addition to mean and variance, the visual system can also encode the shape of feature distributions for features like color or orientation. In an odd-one-out search task we investigated observers' ability to encode two feature distributions ...

Added: May 30, 2020

Fast Generation of Best Interval Patterns for Nonmonotonic Constraints

Buzmakov A. V., Kuznetsov S., Napoli A., , in : Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings. * 2. Vol. 9285.: Dordrecht, L., Cham, Heidelberg, NY : Springer, 2015. P. 157-172.

In pattern mining, the main challenge is the exponential explosion of the set of patterns. Typically, to solve this problem, a constraint for pattern selection is introduced. One of the first constraints proposed in pattern mining is support (frequency) of a pattern in a dataset. Frequency is an anti-monotonic function, i.e., given an infrequent pattern, ...

Added: October 22, 2015