Interpretability and Effectiveness of Machine Learning Methods for Sequence Mining in Various Domains
There is a diverse variety of demographic data that can be analyzed with modern methods of data mining to achieve better results. On the one hand, the main chosen task is to compare different methods for the next event prediction and gender prediction, on the other hand, we pay special attention to interpretable patterns describing demographic behavior in the studied problems. There were considered interpretable methods as decision trees and their ensembles and semi- or non-interpretable methods, such as the SVM method with different customized kernels tailored for demographers' needs and neural networks, respectively. The best accuracy results were obtained with two-channel Convolutional Neural Networks.
This study is dedicated to the introduction of a novel method that automatically extracts potential structural alerts from a data set of molecules. These triggering structures can be further used for knowledge discovery and classification purposes. Computation of the structural alerts results from an implementation of a sophisticated workflow that integrates a graph mining tool guided by growth rate and stability. The growth rate is a well-established measurement of contrast between classes. Moreover, the extracted patterns correspond to formal concepts; the most robust patterns, named the stable emerging patterns (SEPs), can then be identified thanks to their stability, a new notion originating from the domain of formal concept analysis. All of these elements are explained in the paper from the point of view of computation. The method was applied to a molecular data set on mutagenicity. The experimental results demonstrate its efficiency: it automatically outputs a manageable number of structural patterns that are strongly related to mutagenicity. Moreover, a part of the resulting structures corresponds to already known structural alerts. Finally, an in-depth chemical analysis relying on these structures demonstrates how the method can initiate promising processes of chemical knowledge discovery. © 2015 American Chemical Society.
The 13rd IEEE International Conference on Data Mining (IEEE ICDM 2013) has solicited workshops on topics related to new research directions and novel applications of data mining. The goal of the ICDM workshops program (IEEE ICDMW) is to identify grand challenges in data mining, to explore the possible paths to address these urgent problems, and to solicit broad participation from the data mining community and other relevant research communities. IEEE ICDMW 2013 was held on December 7 in Dallas, Texas, USA, and was immediately followed by IEEE ICDM 2013. This year, we have received 41 workshop proposals, a 141% increase from the number of proposals in the previous year. Of those submissions, 26 workshop proposals were accepted through a thorough review by the ICDMW workshop organization committee. 18 workshops eventually made their way to prepare their workshop programs after a rigorous paper review process. The final program consisted of 13 full-day workshops and 5 halfday workshops. Overall, the ICDMW Program received 364 submissions, which is a 19% increase from the number of submissions in the previous year. Of those submissions, 183 papers were accepted. The workshop proposal acceptance rate is about 44%, and the workshop papers acceptance rate is about 50%. The highly competitive acceptance rates have resulted in the highquality and exciting ICDMW proceedings. IEEE ICDMW 2013 covered many new research and application areas as well as fundamental data mining topics. The traditional and fundamental disciplines included spatial and spatiotemporal data mining, optimization, concept drift, domain driven data mining, opinion mining, and sentiment analysis. Emerging disciplines included high-dimensional data mining, causal discovery, cloud and distributed computing, data mining in service applications, and of course, big data. IEEE ICDMW 2013 provided discussion forums for exciting applications including biological data mining in healthcare, data mining in networks, data privacy, and data mining case studies. The ICDMW Program also explored new areas of data markets in sciences and businesses, data mining in experimental economics, and data mining in astronomical problems. Many people worked together in organizing IEEE ICDMW 2013. We would like to thank all workshop organizers for the high-quality workshop proposals received. The workshop organizers are the key to the success of the ICDMW program. We should thank them all for their tremendous effort putting together 18 exciting workshops in the final program.
This paper considers a data analysis system for collaborative platforms which was developed by the joint research team of the National Research University Higher School of Economics and the Witology company. Our focus is on describing the methodology and results of the first experiments. The developed system is based on several modern models and methods for analysing of object-attribute and unstructured data (texts) such as Formal Concept Analysis, multimodal clustering, association rule mining, and keyword and collocation extraction from texts.
An approach to the detection of hidden information (stegocontainers) in the audio data of MP3 files based on neural network modeling is considered. A multilayer perceptron is used as the instrumental model of the neural network. The structural components of the MP3 file are analyzed: fields containing related information (song titles, album, information about the author, lyrics, etc.), and frames, and fragmented sets of encoded audio data. Useful data are highlighted. A procedure is proposed for presenting audio data of any MP3 file as a uniform set of features of a relatively small size. The dimension of the feature set (data set) can be selected from the range [100-520], in accordance with the minimum and maximum frame size, depending on the compression quality of a single audio file when encoded in MP3 format. Modern software packages for encrypting and decrypting stegocontainers into MP3 files are being investigated. Based on selected software implementations, a database of examples (data sets) is formed from pre-processed MP3 files both containing the stegocontainer and without the stegocontainer. The structure of the neural network for steganalysis of MP3 files is determined experimentally, it is trained and tested. The test results of the neural network system allow us to state its high efficiency