RuZA 2015 Workshop. Proceedings of Russian and South African Workshop on Knowledge Discovery Techniques Based on Formal Concept Analysis (RuZA 2015). November 30 - December 5, 2015, Stellenbosch, South Africa
The RuZA workshop (RuZA2015) was an international Russian-South African workshop on applications of Formal Concept Analysis in Computer Science and Data Analysis. All of the contributed research papers reported on research where models based on Formal Concept Analysis were extensively used. Formal concept analysis (FCA) is a branch of lattice theory motivated by the need for a clear formalization of the notions of concept and conceptual hierarchy. It has been successfully used for conceptual clustering and association-rule mining. We believe that formal concept analysis and its extensions can contribute to the analysis and mining of social networks, text mining, modelling processes, and political studies, among other fields. The objective of the RuZA workshop was to bring together researchers and practitioners to discuss the ways FCA can be used in various applications related to these domains. The workshop program included an introductory talk ”Towards Efficient, Real-time Pocket Data Mining: Current Trends and Open Challenges” given by Herna L. Viktor from School of Electrical Engineering and Computer Science, University of Ottawa, as well as regular talks followed by a panel discussions. The proceedings of RuZA workshop include seven papers that were reviewed by at least two reviewers. We would like to thank all the authors for their contributions and the organizers of RuZA 2015 for their kind support in hosting the workshop. Our warm thanks go also to the reviewers for their careful review of the submissions and their useful comments and suggestions. Finally, we would like to thank Russian foundation for Basic Research (grant no. 14-01-93960) and National Research Foundation of South Africa (grant no. 92187) for financial support of research collaboration and organization of the workshop.
Sergei O. Kuznetsov Bruce W. Watson
Being an unsupervised machine learning and data mining technique, biclustering and its multimodal extensions are becoming popular tools for analysing object-attribute data in different domains. Apart from conventional clustering techniques, biclustering is searching for homogeneous groups of objects while keeping their common description, e.g., in binary setting, their shared attributes. In bioinformatics, biclustering is used to find genes, which are active in a subset of situations, thus being candidates for biomarkers. However, the authors of those biclustering techniques that are popular in gene expression analysis, may overlook the existing methods. For instance, BiMax algorithm is aimed at finding biclusters, which are well-known for decades as formal concepts. Moreover, even if bioinformatics classify the biclustering methods according to reasonable domain-driven criteria, their classification taxonomies may be different from survey to survey and not full as well. So, in this paper we propose to use concept lattices as a tool for taxonomy building (in the biclustering domain) and attribute exploration as means for cross-domain taxonomy completion.
In this paper we present some preliminary results for text corpus visualization by means of so-called reference graphs. The nodes of this graph stand for key words or phrases extracted from the texts and the edges represent the reference relation. The node A refers to the node B if the corresponding key word / phrase B is more likely to co-occur with key word / phrase A than to occur on its own. Since reference graphs are directed graphs, we are able to use graphtheoretic algorithms for further analysis of the text corpus. The visualization technique is tested on our own Web-based corpus of Russian-language newspapers.
Domain ontologies are essential in disciplines as diverse as software engineering, medicine, or political science to name just a few. This paper describes an ongoing effort to develop a methodology for collaborative ontology construction by geographically spread communities of experts and implement a web-based prototype supporting this methodology. A distinctive feature of the proposed approach is the use of conceptual exploration techniques, which make it possible to organize the process of ontology construction by automatically identifying and explicitly highlighting issues that remain to be addressed. Given a set of objects (facts, situations, etc.) of a subject domain, which is known to have considerably more such objects, and their unified descriptions in terms of presence or absence of certain attributes, a conceptual exploration system maintains a compact representation of implications behind the currently built ontology and offers them for experts to accept or falsify by entering new objects or extending the description language with new attributes. Upon termination, exploration results in identification of a (relatively small) representative part of the domain from which a conceptual hierarchy of the entire domain can be automatically constructed. We consider theoretic, algorithmic, representational, and pragmatic issues of transforming the exploration methods into a toolset useful for domain experts.
In order to find research on a specific topic or to get an overview of the topics that are published at different academic venues, academics need to browse data from existing academic publications. The title and abstract of publications contains useful key-phrases indicating the topic of the publication, but these need to be directly extracted and presented in a browsable format in order to allow the user to find relevant publications. We extract key-phrases and use these to construct a concept lattice for a dataset of publications. We then present the information in an intuitive interactive tag cloud browser where navigation is supported by the underlying concept lattice.
Formal Concept Analysis Research Toolbox (FCART) is an integrated environment for knowledge and data engineers with a set of research tools based on Formal Concept Analysis (FCA). In the paper we consider main FCA workflow and some applications in the field of the text pattern matching.
The speed of full-text search directly affects the process of text analysis. Search engine creates a text index, which is used for fast full-text search. Solr and ElasticSearch are two popular search engines. A text analysis system requires fast implementing searching and indexing at the same time. This paper describes preprocessing workflow of the analysis system called Formal Concept Analysis Research Toolbox (FCART) and experiment of searching and indexing social networking service data at the same time. Results of the experiment show which search engine is better as the core of FCART search subsystem.