A One-Pass Triclustering Approach: Is There any Room for Big Data?

D. V. Gnatyshak; D. I. Ignatov; S. Kuznetsov; Nourine L.

?

A One-Pass Triclustering Approach: Is There any Room for Big Data?

P. 231-242.

Gnatyshak D. V., Ignatov D. I., Kuznetsov S., Nourine L.

An efficient one-pass online algorithm for triclustering of binary data (triadic formal contexts) is proposed. This algorithm is a modified version of the basic algorithm for OAC-triclustering approach, but it has linear time and memory complexities with respect to the cardinality of the underlying ternary relation and can be easily parallelized in order to be applied for the analysis of big datasets. The results of computer experiments show the efficiency of the proposed algorithm.

Language: English

Full text

Text on another site

Keywords: data mining Formal Concept Analysis triclustering big data triadic data

Publication based on the results of:

Mathematical models, algorithms and software for data mining in the text and the structural form (2014)

In book

CLA 2014: Proceedings of the Eleventh International Conference on Concept Lattices and Their Applications

Kosice : Pavol Jozef Safarik University, 2014

Greedy Modifications of OAC-triclustering Algorithm

Gnatyshak D. V., , in : Procedia Computer Science. 2nd International Conference on Information Technology and Quantitative Management, ITQM 2014. National Research University Higher School of Economics (HSE) in Moscow (Russia) on June 3-5, 2014. Vol. 31.: Amsterdam : Elsevier, 2014. P. 1116-1123.

In this paper we propose several possible modifications to the OAC-triclustering algorithms based on the prime operators. This method based on the framework of Formal Concept Analysis showed some rather promising results in the previous research. But while it is fast and ecient with respect to such measures as average density of the output, diversity, ...

Added: September 11, 2014

Full-text Search in Intermediate Data Storage of FCART

Neznanov A., Parinov A., , in : RuZA 2015 Workshop. Proceedings of Russian and South African Workshop on Knowledge Discovery Techniques Based on Formal Concept Analysis (RuZA 2015). November 30 - December 5, 2015, Stellenbosch, South Africa. Vol. 1552.: Aachen : CEUR Workshop Proceedings, 2015.

The speed of full-text search directly affects the process of text analysis. Search engine creates a text index, which is used for fast full-text search. Solr and ElasticSearch are two popular search engines. A text analysis system requires fast implementing searching and indexing at the same time. This paper describes preprocessing workflow of the analysis ...

Added: June 14, 2016

From Triadic FCA to Triclustering: Experimental Comparison of Some Triclustering Algorithms

Dmitry V. Gnatyshak, Dmitry I. Ignatov, Sergei O. Kuznetsov, , in : CLA 2013 Proceedings of the Tenth International Conference on Concept Lattices and Their Applications. : La Rochelle : Laboratory L3i, University of La Rochelle, 2013. P. 249-260.

In this paper we show the results of the experimental comparison of ve triclustering algorithms on real-world and synthetic data wrt. resource eciency and 4 quality measures. One of the algorithms, the OAC-triclustering based on prime operators, is presented rst time in this paper. Interpretation of results for real-world datasets is provided. ...

Added: October 18, 2013

Triadic Formal Concept Analysis and triclustering: searching for optimal patterns

Ignatov D. I., Gnatyshak D. V., Sergei O. Kuznetsov et al., Machine Learning 2015 Vol. 101 No. 1 P. 271-302

This paper presents several definitions of “optimal patterns” in triadic data and results of experimental comparison of five triclustering algorithms on real-world and synthetic datasets. The evaluation is carried over such criteria as resource efficiency, noise tolerance and quality scores involving cardinality, density, coverage, and diversity of the patterns. An ideal triadic pattern is a totally dense ...

Added: April 15, 2015

Triadic Formal Concept Analysis and Triclustering: Searching for Optimal Patterns

Ignatov D. I., Gnatyshak D. V., Kuznetsov S. et al., Machine Learning 2015

In this paper we search for optimal patterns in triadic data and show the results of the experimental comparison of five triclustering algorithms on real-world and synthetic data over resource efficiency, noise toler- ance and four quality criteria (cardinality, density, coverage, and diversity). The starting point of the study is absolutely dense maximal cuboids (formal ...

Added: October 25, 2013

Proceedings of the International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI at ECAI 2016)

M. : [б.и.], 2016

The four preceding editions of the FCA4AI Workshop showed that many researchers working in Artificial Intelligence are deeply interested by a well-founded method for classi- fication and mining such as Formal Concept Analysis (see http://www.fca4ai.hse.ru/). The first edition of FCA4AI was co-located with ECAI 2012 in Montpellier, the second one with IJCAI 2013 in Beijing, ...

Added: October 6, 2016

Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications Clermont-Ferrand, France, October 13-16, 2015

Clermont-Ferrand : CEUR Workshop Proceedings, 2015

Formal Concept Analysis is a method of analysis of logical data based on formalization of conceptual knowledge by means of lattice theory. It has proved to be of interest to various applied fields such as data visualization, knowledge discovery and data mining, database theory, and many others. The International Conference “Concept Lattices and Their Applications ...

Added: October 22, 2015

Concept Learning from Triadic Data

Zhuk R., Ignatov D. I., Konstantinova N., Procedia Computer Science 2014 Vol. 31 P. 928-938

We propose extensions of the classical JSM-method and the Na ̈ıve Bayesian classifier for the case of triadic relational data. We performed a series of experiments on various types of data (both real and synthetic) to estimate quality of classification techniques and compare them with other classification algorithms that generate hypotheses, e.g. ID3 and Random ...

Added: June 9, 2014

Triclustering in Big Data Setting

Egurnov D., Точилкин Д. С., Ignatov D. I., , in : Complex Data Analytics with Formal Concept Analysis. : Springer, 2022. P. 239-258.

In this paper, we describe versions of triclustering algorithms adapted for efficient calculations in distributed environments with MapReduce model or parallelisation mechanism provided by modern programming languages. OAC-family of triclustering algorithms shows good parallelisation capabilities due to the independent processing of triples of a triadic formal context. We provide time and space complexity of the ...

Added: November 1, 2022

Learning hypotheses from triadic labeled data

Ignatov D. I., Zhuk R., Konstantinova N., , in : Proceedings of The 2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2014, 11-14 August 2014 Warsaw, Poland. : Los Alamitos, Washington, Tokyo : IEEE Computer Society, 2014. P. 474-480.

We propose extensions of the classical JSM-method andtheNa ̈ıveBayesianclassifierforthecaseoftriadicrelational data. We performed a series of experiments on various types of data (both real and synthetic) to estimate quality of classification techniques and compare them with other classification algorithms that generate hypotheses, e.g. ID3 and Random Forest. In addition to classification precision and recall we also ...

Added: June 9, 2014

Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data

Ignatov D. I., Khvorykh G., Khrunin A. et al., , in : Recent Trends in Analysis of Images, Social Networks and Texts. 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020 Revised Supplementary Proceedings. Vol. 12602.: Springer, 2021. P. 185-204.

© 2021, Springer Nature Switzerland AG.Missing genotypes can affect the efficacy of machine learning approaches to identify the risk genetic variants of common diseases and traits. The problem occurs when genotypic data are collected from different experiments with different DNA microarrays, each being characterised by its pattern of uncalled (missing) genotypes. This can prevent the ...

Added: November 1, 2022

Gaining Insight in Social Networks with Biclustering and Triclustering

Gnatyshak D. V., Ignatov D. I., Semenov A. et al., , in : Perspectives in Business Informatics Research. 11th International Conference, BIR 2012, Nizhny Novgorod, Russia, September 2012 Proceedings. Issue 128.: Berlin, Heidelberg : Springer, 2012. P. 162-171.

We combine bi- and triclustering to analyse data collected from the Russian online social network Vkontakte. Using biclustering we extract groups of users with similar interests and find communities of users which belong to similar groups. With triclustering we reveal users' interests as tags and use them to describe Vkontakte groups. After this social tagging ...

Added: December 3, 2012

FCA-Based Recommender Models and Data Analysis for Crowdsourcing Platform Witology

Ignatov D. I., Kaminskaya A. Y., Malioukov A. et al., , in : Proceedings of International Conference on Conceptual Structures 2014. Vol. 8577: Graph-Based Representation and Reasoning.: Springer, 2014. P. 287-292.

This paper considers a recommender part of the data anal- ysis system for the collaborative platform Witology. It was developed by the joint research team of the National Research University Higher School of Economics and the Witology company. This recommender sys- tem is able to recommend ideas, like-minded users and antagonists at the respective phases ...

Added: June 9, 2014

Supplementary Proceedings ICFCA 2019 Conference and Workshops

CEUR Workshop Proceedings, 2019

Added: October 31, 2019

Intelligent Data Processing 11th International Conference, IDP 2016, Barcelona, Spain, October 10–14, 2016, Revised Selected Papers

Switzerland : Springer, 2019

This book constitutes the refereed proceedings of the 11th International Conference on Intelligent Data Processing, IDP 2016, held in Barcelona, Spain, in October 2016. The 11 revised full papers were carefully reviewed and selected from 52 submissions. The papers of this volume are organized in topical sections on machine learning theory with applications; intelligent data processing in life ...

Added: February 8, 2020

Putting OAC-triclustering on MapReduce

Зудин С., Gnatyshak D. V., Ignatov D. I., , in : Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications Clermont-Ferrand, France, October 13-16, 2015. Vol. 1466.: Clermont-Ferrand : CEUR Workshop Proceedings, 2015. P. 47-58.

In our previous work an efficient one-pass online algorithm for triclustering of binary data (triadic formal contexts) was proposed. This algorithm is a modified version of the basic algorithm for OAC-triclustering approach; it has linear time and memory complexities. In this paper we parallelise it via map-reduce framework in order to make it suitable for big datasets. The results of ...

Added: October 23, 2015

Proceedings of the 11th IEEE International Conference “Application of Information and Communication Technologies” (AICT-2017)

M. : -, 2017

Added: October 27, 2017

Triclusters of Close Values for the Analysis of 3D Data

Egurnov D., Ignatov D. I., Automation and Remote Control 2022 Vol. 83 No. 6 P. 894-902

Abstract: The paper deals with the problem of triclustering in multivalued triadic contexts in termsof one multidimensional extension of formal concept analysis; triclustering can be viewed as asearch for dense subtensors in three-dimensional tensors over the field of real numbers. Twomethods are proposed for solving this problem, namely, NOAC—a version of the OACtriclustering method for ...

Added: November 1, 2022

Proceedings of the Fifthteenth International Conference on Concept Lattices and Their Applications

CEUR-WS.org, 2020

The CLA conference is an international forum for researchers, practitioners and students dedicated to the practice of Formal Concept Analysis (FCA) and areas closely related to it, including data analysis and mining, information retrieval, knowledge management, knowledge engineering, logic, algebra and lattice theory. The 15th of CLA, CLA 2020, was going to be held in Tallinn, Estonia ...

Added: October 30, 2020

Mining Complex Data Generated by Collaborative Platforms

Ignatov D. I., Kaminskaya A. Y., Bezzubtseva A. A. et al., , in : Перспективные направления исследований в области бизнес-информатики: Материалы XI международной конференции. : Nizhny Novgorod : Higher School of Economics in Nizhny Novgorod, 2012. P. 7-17.

In a crowdsourcing project several participants discuss and solve one common problem, propose their ideas, evaluate ideas of each other, etc. We propose the novel instrument CrowDM for analyzing data generated by collaborative platforms. The initial version of the system combines several innovative techniques for structured and unstructured data analysis. Formal Concept Analysis, multimodal clustering ...

Added: December 3, 2012

CDUD'11 – Concept Discovery in Unstructured Data Workshop co-located with the 13th International Conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing (RSFDGrC-2011), June 2011, Moscow, Russia

M. : Higher School of Economics Publishing House, 2011

Concept discovery is a Knowledge Discovery in Databases (KDD) research field that uses human-centered techniques such as Formal Concept Analysis (FCA), Biclustering, Triclustering, Conceptual Graphs etc. for gaining insight into the underlying conceptual structure of the data. Traditional machine learning techniques are mainly focusing on structured data whereas most data available resides in unstructured, often ...

Added: December 3, 2012

Formal Concept Analysis. 10th International Conference, ICFCA 2012, Leuven, Belgium, May 7-10, 2012 Proceedings

Domenach F., Ignatov D. I., Poelmans J., Berlin, Heidelberg : Springer, 2012

This book constitutes the refereed proceedings of the 10th International Conference on Formal Concept Analysis, ICFCA 2012, held in Leuven, Belgium in May 2012. The 20 revised full papers presented together with 6 invited talks were carefully reviewed and selected from 68 submissions. The topics covered in this volume range from recent advances in machine ...

Added: December 3, 2012

Multimodal Clustering of Boolean Tensors on MapReduce: Experiments Revisited

Ignatov D. I., Egurnov D., Точилкин Д. С., , in : Supplementary Proceedings ICFCA 2019 Conference and Workshops. Vol. 2378.: CEUR Workshop Proceedings, 2019. P. 137-151.

This paper presents further development of distributed multimodal clustering. We introduce a new version of multimodal clustering algorithm for distributed processing in Apache Hadoop on computer clusters. Its implementation allows a user to conduct clustering on data with modality greater than two. We provide time and space complexity of the algorithm and justify its relevance. ...

Added: October 31, 2019

From Triconcepts to Triclusters

Ignatov D. I., Kuznetsov S., Zhukov L. E., , in : Rough Sets, Fuzzy Sets, Data Mining and Granular Computing: 13th International Conference, RSFDGrC 2011, Moscow, Russia, June 25-27, 2011. Proceedings. Vol. 6743.: Berlin, Heidelberg : Springer, 2011. P. 257-264.

A novel approach to triclustering of a three-way binary data is proposed. Tricluster is defined in terms of Triadic Formal Concept Analysis as a dense triset of a binary relation Y , describing relationship between objects, attributes and conditions. This definition is a relaxation of a triconcept notion and makes it possible to find all ...

Added: December 3, 2012