Manifold Learning in Data Mining Tasks
Many Data Mining tasks deal with data which are presented in high dimensional spaces, and the ‘curse of dimensionality’ phenomena is often an obstacle to the use of many methods for solving these tasks. To avoid these phenomena, various Representation learning algorithms are used as a first key step in solutions of these tasks to transform the original high-dimensional data into their lower-dimensional representations so that as much information about the original data required for the considered Data Mining task is preserved as possible. The above Representation learning problems are formulated as various Dimensionality Reduction problems (Sample Embedding, Data Manifold embedding, Manifold Learning and newly proposed Tangent Bundle Manifold Learning) which are motivated by various Data Mining tasks. A new geometrically motivated algorithm that solves the Tangent Bundle Manifold Learning and gives new solutions for all the considered Dimensionality Reduction problems is presented.
This study is dedicated to the introduction of a novel method that automatically extracts potential structural alerts from a data set of molecules. These triggering structures can be further used for knowledge discovery and classification purposes. Computation of the structural alerts results from an implementation of a sophisticated workflow that integrates a graph mining tool guided by growth rate and stability. The growth rate is a well-established measurement of contrast between classes. Moreover, the extracted patterns correspond to formal concepts; the most robust patterns, named the stable emerging patterns (SEPs), can then be identified thanks to their stability, a new notion originating from the domain of formal concept analysis. All of these elements are explained in the paper from the point of view of computation. The method was applied to a molecular data set on mutagenicity. The experimental results demonstrate its efficiency: it automatically outputs a manageable number of structural patterns that are strongly related to mutagenicity. Moreover, a part of the resulting structures corresponds to already known structural alerts. Finally, an in-depth chemical analysis relying on these structures demonstrates how the method can initiate promising processes of chemical knowledge discovery. © 2015 American Chemical Society.
The 13rd IEEE International Conference on Data Mining (IEEE ICDM 2013) has solicited workshops on topics related to new research directions and novel applications of data mining. The goal of the ICDM workshops program (IEEE ICDMW) is to identify grand challenges in data mining, to explore the possible paths to address these urgent problems, and to solicit broad participation from the data mining community and other relevant research communities. IEEE ICDMW 2013 was held on December 7 in Dallas, Texas, USA, and was immediately followed by IEEE ICDM 2013. This year, we have received 41 workshop proposals, a 141% increase from the number of proposals in the previous year. Of those submissions, 26 workshop proposals were accepted through a thorough review by the ICDMW workshop organization committee. 18 workshops eventually made their way to prepare their workshop programs after a rigorous paper review process. The final program consisted of 13 full-day workshops and 5 halfday workshops. Overall, the ICDMW Program received 364 submissions, which is a 19% increase from the number of submissions in the previous year. Of those submissions, 183 papers were accepted. The workshop proposal acceptance rate is about 44%, and the workshop papers acceptance rate is about 50%. The highly competitive acceptance rates have resulted in the highquality and exciting ICDMW proceedings. IEEE ICDMW 2013 covered many new research and application areas as well as fundamental data mining topics. The traditional and fundamental disciplines included spatial and spatiotemporal data mining, optimization, concept drift, domain driven data mining, opinion mining, and sentiment analysis. Emerging disciplines included high-dimensional data mining, causal discovery, cloud and distributed computing, data mining in service applications, and of course, big data. IEEE ICDMW 2013 provided discussion forums for exciting applications including biological data mining in healthcare, data mining in networks, data privacy, and data mining case studies. The ICDMW Program also explored new areas of data markets in sciences and businesses, data mining in experimental economics, and data mining in astronomical problems. Many people worked together in organizing IEEE ICDMW 2013. We would like to thank all workshop organizers for the high-quality workshop proposals received. The workshop organizers are the key to the success of the ICDMW program. We should thank them all for their tremendous effort putting together 18 exciting workshops in the final program.
This paper considers a data analysis system for collaborative platforms which was developed by the joint research team of the National Research University Higher School of Economics and the Witology company. Our focus is on describing the methodology and results of the first experiments. The developed system is based on several modern models and methods for analysing of object-attribute and unstructured data (texts) such as Formal Concept Analysis, multimodal clustering, association rule mining, and keyword and collocation extraction from texts.
This volume contains the extended version of selected talks given at the international research workshop "Coping with Complexity: Model Reduction and Data Analysis", Ambleside, UK, August 31 – September 4, 2009. The book is deliberately broad in scope and aims at promoting new ideas and methodological perspectives. The topics of the chapters range from theoretical analysis of complex and multiscale mathematical models to applications in e.g., fluid dynamics and chemical kinetics.