Advances in Data Mining. Applications and Theoretical Aspects. 12th Industrial Conference, ICDM 2012, Berlin, Germany, July 13-20, 2012. Proceedings
This book constitutes the refereed proceedings of the 12th Industrial Conference on Data Mining, ICDM 2012, held in Berlin, Germany in July 2012. The 22 revised full papers presented were carefully reviewed and selected from 97 submissions. The papers are organized in topical sections on data mining in medicine and biology; data mining for energy industry; data mining in traffic and logistic; data mining in telecommunication; data mining in engineering; theory in data mining; theory in data mining: clustering; theory in data mining: association rule mining and decision rule mining.
In this paper we introduce a novel human-centered data mining software system which was designed to gain intelligence from unstructured textual data. The architecture takes its roots in several case studies which were a collaboration between the Amsterdam-Amstelland Police, GasthuisZusters Antwerpen (GZA) hospitals and KU Leuven. It is currently being implemented by bachelor and master students of Moscow Higher School of Economics. At the core of the system are concept lattices which can be used to interactively explore the data. They are combined with several other complementary statistical data analysis techniques such as Emergent Self Organizing Maps and Hidden Markov Models.
Formal Concept Analysis (FCA) is an unsupervised clustering technique and many scientific papers are devoted to applying FCA in Information Retrieval (IR) research. We collected 103 papers published between 2003-2009 which mention FCA and information retrieval in the abstract, title or keywords. Using a prototype of our FCA-based toolset CORDIET, we converted the pdf-files containing the papers to plain text, indexed them with Lucene using a thesaurus containing terms related to FCA research and then created the concept lattice shown in this paper. We visualized, analyzed and explored the literature with concept lattices and discovered multiple interesting research streams in IR of which we give an extensive overview. The core contributions of this paper are the innovative application of FCA to the text mining of scientific papers and the survey of the FCA-based IR research.