Manifold Learning in Data Mining Tasks
This study is dedicated to the introduction of a novel method that automatically extracts potential structural alerts from a data set of molecules. These triggering structures can be further used for knowledge discovery and classification purposes. Computation of the structural alerts results from an implementation of a sophisticated workflow that integrates a graph mining tool guided by growth rate and stability. The growth rate is a well-established measurement of contrast between classes. Moreover, the extracted patterns correspond to formal concepts; the most robust patterns, named the stable emerging patterns (SEPs), can then be identified thanks to their stability, a new notion originating from the domain of formal concept analysis. All of these elements are explained in the paper from the point of view of computation. The method was applied to a molecular data set on mutagenicity. The experimental results demonstrate its efficiency: it automatically outputs a manageable number of structural patterns that are strongly related to mutagenicity. Moreover, a part of the resulting structures corresponds to already known structural alerts. Finally, an in-depth chemical analysis relying on these structures demonstrates how the method can initiate promising processes of chemical knowledge discovery. © 2015 American Chemical Society.
Статья посвящена разработке метода трикластеризации на основе графовой спектральной кластеризации. В серии экспериментов на реальных данных исследована эффективность и пригодность метода к анализу данных систем совместного пользования ресурсами, т.н. фолксономий
В работе даются основные определения анализа формальных понятий (АФП), рассказывается о его роли в математике и компьютерных науках, а также приводится краткий обзор его основных приложений.
Two novel approaches to triclustering of three-way binary data are proposed. Tricluster is defined as a dense subset of a ternary relation Y defined on sets of objects, attributes, and conditions, or, equivalently, as a dense submatrix of the adjacency matrix of the ternary relation Y. This definition is a scalable relaxation of the notion of triconcept in Triadic Concept Analysis, whereas each triconcept of the initial data-set is contained in a certain tricluster. This approach generalizes the one previously introduced for concept-based biclustering. We also propose a hierarchical spectral triclustering algorithm for mining dense submatrices of the adjacency matrix of the initial ternary relation Y. Finally, we describe some applications of the proposed techniques, compare proposed approaches and study their performance in a series of experiments with real data-sets.
The 13rd IEEE International Conference on Data Mining (IEEE ICDM 2013) has solicited workshops on topics related to new research directions and novel applications of data mining. The goal of the ICDM workshops program (IEEE ICDMW) is to identify grand challenges in data mining, to explore the possible paths to address these urgent problems, and to solicit broad participation from the data mining community and other relevant research communities. IEEE ICDMW 2013 was held on December 7 in Dallas, Texas, USA, and was immediately followed by IEEE ICDM 2013. This year, we have received 41 workshop proposals, a 141% increase from the number of proposals in the previous year. Of those submissions, 26 workshop proposals were accepted through a thorough review by the ICDMW workshop organization committee. 18 workshops eventually made their way to prepare their workshop programs after a rigorous paper review process. The final program consisted of 13 full-day workshops and 5 halfday workshops. Overall, the ICDMW Program received 364 submissions, which is a 19% increase from the number of submissions in the previous year. Of those submissions, 183 papers were accepted. The workshop proposal acceptance rate is about 44%, and the workshop papers acceptance rate is about 50%. The highly competitive acceptance rates have resulted in the highquality and exciting ICDMW proceedings. IEEE ICDMW 2013 covered many new research and application areas as well as fundamental data mining topics. The traditional and fundamental disciplines included spatial and spatiotemporal data mining, optimization, concept drift, domain driven data mining, opinion mining, and sentiment analysis. Emerging disciplines included high-dimensional data mining, causal discovery, cloud and distributed computing, data mining in service applications, and of course, big data. IEEE ICDMW 2013 provided discussion forums for exciting applications including biological data mining in healthcare, data mining in networks, data privacy, and data mining case studies. The ICDMW Program also explored new areas of data markets in sciences and businesses, data mining in experimental economics, and data mining in astronomical problems. Many people worked together in organizing IEEE ICDMW 2013. We would like to thank all workshop organizers for the high-quality workshop proposals received. The workshop organizers are the key to the success of the ICDMW program. We should thank them all for their tremendous effort putting together 18 exciting workshops in the final program.
This paper considers a data analysis system for collaborative platforms which was developed by the joint research team of the National Research University Higher School of Economics and the Witology company. Our focus is on describing the methodology and results of the first experiments. The developed system is based on several modern models and methods for analysing of object-attribute and unstructured data (texts) such as Formal Concept Analysis, multimodal clustering, association rule mining, and keyword and collocation extraction from texts.
This volume contains the extended version of selected talks given at the international research workshop "Coping with Complexity: Model Reduction and Data Analysis", Ambleside, UK, August 31 – September 4, 2009. The book is deliberately broad in scope and aims at promoting new ideas and methodological perspectives. The topics of the chapters range from theoretical analysis of complex and multiscale mathematical models to applications in e.g., fluid dynamics and chemical kinetics.
Рассматриваются задачи интеллектуального анализа данных, которые необходимо решать в технологии предсказательного моделирования. Для уменьшения сложности решения этих задач в технологии предсказательного моделирования используются решения задач снижения размерности, которые должны удовлетворять ряду дополнительных условий. В статье обсуждаются эти дополнительные требования и сформулированы соответствующие новые нетрадиционные постановки задач снижения размерности.