### Book

## CLA 2016: Proceedings of the Thirteenth International Conference on Concept Lattices and Their Applications. CEUR Workshop Proceedings

The 13th International Conference on “Concept Lattices and Applications (CLA 2016)” was held at National Research University Higher School of Economics, Moscow, Russia from July 18 until July 22, 2016. The CLA conference, organized since 2002, aims to provide to everyone interested in Formal Concept Analysis and more generally in Concept Lattices or Galois Lattices, an advanced view on some of the last research trends and applications in this field. It also aims to bring together students, professors, researchers and engineers, involved in all aspects of the study of concept lattices, from theory to implementations and practical applications. As the diversity of the selected papers shows, there is a wide range of research directions, around data and knowledge processing, including data mining, knowledge discovery, knowledge representation, reasoning, pattern recognition, together with logic, algebra and lattice theory. The program of the conference includes four keynote talks given by the following distinguished researchers: Lev D. Beklemishev (Mathematical Institute of Russian Academy of Science, Moscow), J´erˆome Euzenat (INRIA Grenoble Rhˆone-Alpes), Bernhard Ganter (TU-Dresden), Boris G. Mirkin (National Research University Higher School of Economics, Moscow). This volume includes the selected papers and the abstracts of the invited talks. This year, 46 papers were submitted, from which 28 papers were accepted as regular papers. We would like to thank here the contributing authors for their valuable work, the members of the program committee and the external reviewers who analyzed the papers with care. All of them participated to the continuing quality and importance of CLA, highlighting its key role in the field. Then we would also like to thank the steering committee of CLA for giving us the occasion of leading this edition of CLA, the conference participants for their participation and support, and people in charge of the organization, especially Larisa I. Antropova, Ekaterina L. Chernyak, Dmitry I. Ignatov, Olga V. Maksimenkova, whose help was very precious in many occasions and that contributed to the success of the event. We would like to thank our sponsors, namely National Research University Higher School of Economics, ExactPro company, Russian Foundation for Basic Research. Finally, we also do not forget that the conference was managed (quite easily) with the Easychair system, for many tasks including paper submission, selection, and reviewing.

Nowadays decision tree learning is one of the most popular classification and regression techniques. Though decision trees are not accurate on their own, they make very good base learners for advanced tree-based methods such as random forests and gradient boosted trees. However, applying ensembles of trees deteriorates interpretability of the final model. Another problem is that decision tree learning can be seen as a greedy search for a good classification hypothesis in terms of some information-based criterion such as Gini impurity or information gain. But in case of small data sets the global search might be possible. In this paper, we propose an FCA-based lazy classification technique where each test instance is classified with a set of the best (in terms of some information-based criterion) rules. In a set of benchmarking experiments, the proposed strategy is compared with decision tree and nearest neighbor learning.

FCA is a mathematical formalism having many applications in data mining and knowledge discovery. Originally it deals with binary data tables. However, there is a number of extensions that enrich stan- dard FCA. In this paper we consider two important extensions: fuzzy FCA and pattern structures, and discuss the relation between them. In particular we introduce a scaling procedure that enables representing a fuzzy context as a pattern structure.

Pattern structures are known to provide a tool for predictive modeling and classification. However, in order to generate classification rules concept lattice should be built. This procedure may take much time and resources. In previous work it was shown that it is possible to escape the problem with so-called lazy associative classification algorithm. It does not require lattice construction and it is applicable to classification problems such as credit scoring. In this paper we adjust this method to the case of continuous target variable, i.e. regression problem, and apply it to recovery rates forecasting. We perform parameters tuning, assess the accuracy of the algorithm based on the bank data and compare it to the models adopted in the bank system and other benchmarks.

Аpproximate cluster structures are those of formal concepts and n-concepts with added numerical intensity weights. The talk presents theoretical results and computational methods for approximate clustering and n-clustering as extensions of the algebraic-geometrical properties of numerical matrices (SVD and the like) to the situations where one or most of elements of the solutions to be found are expressed by binary vectors. The theory embraces such methods as k-means, consensus clustering, network clustering, biclusters and triclusters and provides natural data analysis criteria, effective algorithms and interpretation tools.

A comparison of different treatment strategies does not always result in determining the best one for all patients, one needs to study subgroups of patients with significant difference in efficiency between treatment strategies. To solve this problem an approach to subgroups generation is proposed, where data are described in terms of a pattern structure and pattern concepts stay for patient subgroups and their descriptions. To find the most promising pattern concepts in terms of the difference of treatment strategies in efficiency a version of CbO algorithm is proposed. An application to the analysis of data on childhood acute lymphoblastic leukemia is considered.

Triadic concept analysis has become a popular research direction, since triadic relations give natural models of many data collections. In this paper we address the problem of selecting most interesting concepts by proposing triadic stability indices

We propose a new algorithm for consensus clustering, FCA-Consensus, based on Formal Concept Analysis. As the input, the algorithm takes T partitions of a certain set of objects obtained by k-means algorithm after T runs from different initialisations. The resulting consensus partition is extracted from an antichain of the concept lattice built on a formal context objects×classes, where the classes are the set of all cluster labels from each initial k-means partition. We compare the results of the proposed algorithm in terms of ARI measure with the state-of-the-art algorithms on synthetic datasets. Under certain conditions, the best ARI values are demonstrated by FCA-Consensus.

In this paper, we generalize the classical duplication of intervals in lattices. Namely, we deal with partial duplication instead of complete convex subsets. We characterize these subsets that guarantee the result to remain a lattice.

The paper makes a brief introduction into multiple classifier systems and describes a particular algorithm which improves classification accuracy by making a recommendation of an algorithm to an object. This recommendation is done under a hypothesis that a classifier is likely to predict the label of the object correctly if it has correctly classified its neighbors. The process of assigning a classifier to each object involves here the apparatus of Formal Concept Analysis. We explain the principle of the algorithm on a toy example and describe experiments with real-world datasets.

This book constitutes the second part of the refereed proceedings of the 10th International Conference on Formal Concept Analysis, ICFCA 2012, held in Leuven, Belgium in May 2012. The topics covered in this volume range from recent advances in machine learning and data mining; mining terrorist networks and revealing criminals; concept-based process mining; to scalability issues in FCA and rough sets.

Concept discovery is a Knowledge Discovery in Databases (KDD) research field that uses human-centered techniques such as Formal Concept Analysis (FCA), Biclustering, Triclustering, Conceptual Graphs etc. for gaining insight into the underlying conceptual structure of the data. Traditional machine learning techniques are mainly focusing on structured data whereas most data available resides in unstructured, often textual, form. Compared to traditional data mining techniques, human-centered instruments actively engage the domain expert in the discovery process. This volume contains the contributions to CDUD 2011, the International Workshop on Concept Discovery in Unstructured Data (CDUD) held in Moscow. The main goal of this workshop was to provide a forum for researchers and developers of data mining instruments working on issues with analyzing unstructured data. We are proud that we could welcome 13 valuable contributions to this volume. The majority of the accepted papers described innovative research on data discovery in unstructured texts. Authors worked on issues such as transforming unstructured into structured information by amongst others extracting keywords and opinion words from texts with Natural Language Processing methods. Multiple authors who participated in the workshop used methods from the conceptual structures field including Formal Concept Analysis and Conceptual Graphs. Applications include but are not limited to text mining police reports, sociological definitions, movie reviews, etc.

In this paper we propose two novel methods for analyzing data collected from online social networks. In particular we will do analyses on Vkontake data (Russian online social network). Using biclustering we extract groups of users with similar interests and find communities of users which belong to similar groups. With triclustering we reveal users’ interests as tags and use them to describe Vkontakte groups. After this social tagging process we can recommend to a particular user relevant groups to join or new friends from interesting groups which have a similar taste. We present some preliminary results and explain how we are going to apply these methods on massive data repositories.

Formal Concept Analysis (FCA) is an unsupervised clustering technique and many scientific papers are devoted to applying FCA in Information Retrieval (IR) research. We collected 103 papers published between 2003-2009 which mention FCA and information retrieval in the abstract, title or keywords. Using a prototype of our FCA-based toolset CORDIET, we converted the pdf-files containing the papers to plain text, indexed them with Lucene using a thesaurus containing terms related to FCA research and then created the concept lattice shown in this paper. We visualized, analyzed and explored the literature with concept lattices and discovered multiple interesting research streams in IR of which we give an extensive overview. The core contributions of this paper are the innovative application of FCA to the text mining of scientific papers and the survey of the FCA-based IR research.

An incremental algorithm to construct a lattice from a collection of sets is derived, refined, analyzed, and related to a similar previously published algorithm for constructing concept lattices. The lattice constructed by the algorithm is the one obtained by closing the collection of sets with respect to set intersection. The analysis explains the empirical efficiency of the related concept lattice construction algorithm that had been observed in previous studies. The derivation highlights the effectiveness of a correctness-byconstruction approach to algorithm development.

A model for organizing cargo transportation between two node stations connected by a railway line which contains a certain number of intermediate stations is considered. The movement of cargo is in one direction. Such a situation may occur, for example, if one of the node stations is located in a region which produce raw material for manufacturing industry located in another region, and there is another node station. The organization of freight traﬃc is performed by means of a number of technologies. These technologies determine the rules for taking on cargo at the initial node station, the rules of interaction between neighboring stations, as well as the rule of distribution of cargo to the ﬁnal node stations. The process of cargo transportation is followed by the set rule of control. For such a model, one must determine possible modes of cargo transportation and describe their properties. This model is described by a ﬁnite-dimensional system of diﬀerential equations with nonlocal linear restrictions. The class of the solution satisfying nonlocal linear restrictions is extremely narrow. It results in the need for the “correct” extension of solutions of a system of diﬀerential equations to a class of quasi-solutions having the distinctive feature of gaps in a countable number of points. It was possible numerically using the Runge–Kutta method of the fourth order to build these quasi-solutions and determine their rate of growth. Let us note that in the technical plan the main complexity consisted in obtaining quasi-solutions satisfying the nonlocal linear restrictions. Furthermore, we investigated the dependence of quasi-solutions and, in particular, sizes of gaps (jumps) of solutions on a number of parameters of the model characterizing a rule of control, technologies for transportation of cargo and intensity of giving of cargo on a node station.

Event logs collected by modern information and technical systems usually contain enough data for automated process models discovery. A variety of algorithms was developed for process models discovery, conformance checking, log to model alignment, comparison of process models, etc., nevertheless a quick analysis of ad-hoc selected parts of a journal still have not get a full-fledged implementation. This paper describes an ROLAP-based method of multidimensional event logs storage for process mining. The result of the analysis of the journal is visualized as directed graph representing the union of all possible event sequences, ranked by their occurrence probability. Our implementation allows the analyst to discover process models for sublogs defined by ad-hoc selection of criteria and value of occurrence probability

The geographic information system (GIS) is based on the first and only Russian Imperial Census of 1897 and the First All-Union Census of the Soviet Union of 1926. The GIS features vector data (shapefiles) of allprovinces of the two states. For the 1897 census, there is information about linguistic, religious, and social estate groups. The part based on the 1926 census features nationality. Both shapefiles include information on gender, rural and urban population. The GIS allows for producing any necessary maps for individual studies of the period which require the administrative boundaries and demographic information.

Existing approaches suggest that IT strategy should be a reflection of business strategy. However, actually organisations do not often follow business strategy even if it is formally declared. In these conditions, IT strategy can be viewed not as a plan, but as an organisational shared view on the role of information systems. This approach generally reflects only a top-down perspective of IT strategy. So, it can be supplemented by a strategic behaviour pattern (i.e., more or less standard response to a changes that is formed as result of previous experience) to implement bottom-up approach. Two components that can help to establish effective reaction regarding new initiatives in IT are proposed here: model of IT-related decision making, and efficiency measurement metric to estimate maturity of business processes and appropriate IT. Usage of proposed tools is demonstrated in practical cases.