Triclustering in Big Data Setting
Egurnov D., Точилкин Д. С., Ignatov D. I.
In this paper, we describe versions of triclustering algorithms adapted for efficient calculations in distributed environments with MapReduce model or parallelisation mechanism provided by modern programming languages. OAC-family of triclustering algorithms shows good parallelisation capabilities due to the independent processing of triples of a triadic formal context. We provide time and space complexity of the algorithms and justify their relevance. We also compare performance gain from using a distributed system and scalability.
, , , , in : Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings. * 2. Vol. 9285.: Dordrecht, L., Cham, Heidelberg, NY : Springer, 2015. P. 157-172.
In pattern mining, the main challenge is the exponential explosion of the set of patterns. Typically, to solve this problem, a constraint for pattern selection is introduced. One of the first constraints proposed in pattern mining is support (frequency) of a pattern in a dataset. Frequency is an anti-monotonic function, i.e., given an infrequent pattern, ...
Added: October 22, 2015
, , et al., , in : Conceptual Structures for STEM Research and Education, 20th International Conference on Conceptual Structures. Vol. 7735: Conceptual Structures for STEM Research and Education, 20th International Conference on Conceptual Structures.: Berlin, Heidelberg : Springer, 2013. P. 173-192.
This paper considers a data analysis system for collaborative platforms which was developed by the joint research team of the National Research University Higher School of Economics and the Witology company. Our focus is on describing the methodology and results of the first experiments. The developed system is based on several modern models and methods ...
Added: October 10, 2013
, , et al., International Journal of General Systems 2013 Vol. 42 No. 6 P. 572-593
formal concept analysis, data mining, triclustering, three-way data, folksonomy, spectral triclustering ...
Added: October 16, 2013
, , , in : CEUR Workshop Proceedings. Proceedings of the International Workshop on Social Network Analysis using Formal Concept Analysis (SNAFCA 2015). Issue 1534: SNAFCA 2015 Social Network Analysis using Formal Concept Analysis.: Malaga : CEUR Workshop Proceedings, 2015. Ch. 5. P. 43-54.
Nowadays social data analysts use a complicated mix of languages, methods and technologies for analyzing social networks services (SNS) data. In this article we describe approaches and technologies for extracting, analyzing and visualizing social data using Formal Concept Analysis Research Toolbox (FCART). Integrated process of analyzing SNS data with a set of research tools based ...
Added: October 19, 2015
, , et al., International Journal of General Systems 2016 Vol. 45 No. 2 P. 135-159
Nowadays data-sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of “complex” sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of ...
Added: February 25, 2016
, , , , in : Supplementary Proceedings ICFCA 2019 Conference and Workshops. Vol. 2378.: CEUR Workshop Proceedings, 2019. P. 137-151.
This paper presents further development of distributed multimodal clustering. We introduce a new version of multimodal clustering algorithm for distributed processing in Apache Hadoop on computer clusters. Its implementation allows a user to conduct clustering on data with modality greater than two. We provide time and space complexity of the algorithm and justify its relevance. ...
Added: October 31, 2019
, , , , in : Proceedings, Workshop “What can FCA do for Artificial Intelligence?” of the ECAI 2012 conference. : M. : CEUR Workshop Proceedings, 2012. Ch. 12. P. 81-87.
Software system Cordiet-FCA is presented, which is designed for knowledge discovery in big dynamic data collections, including texts in natural language. Cordiet-FCA allows one to compose ontology-controlled queries and outputs concept lattice, implication bases, association rules, and other useful concept-based artifacts. Efficient algorithms for data preprocessing, text processing, and visualization of results are discussed. Examples ...
Added: January 30, 2013
, , , in : CLA 2016: Proceedings of the Thirteenth International Conference on Concept Lattices and Their Applications. CEUR Workshop Proceedings. Vol. 1624.: M. : Higher School of Economics, National Research University, 2016. P. 285-296.
Formal Concept Analysis (FCA) provides mathematical models, methods and algorithms for data analysis. However, by now there is no easily available program system, which would provide data analyst with unified, intelligible and transparent access to various external data sources with large amount of heterogeneous data for subsequent FCA-based knowledge discovery. The lack of such tools ...
Added: October 19, 2016
, , et al., Annals of Mathematics and Artificial Intelligence 2014 Vol. 70 No. 1 P. 55-79
Biclustering numerical data became a popular data-mining task at the beginning of 2000’s, especially for gene expression data analysis and recommender systems. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So-called biclusters of similar values can be thought as maximal sub-tables with ...
Added: October 27, 2015
, , , Triclustering in Big Data Setting / . 2020.
In this paper, we describe versions of triclustering algorithms adapted for efficient calculations in distributed environments with MapReduce model or parallelisation mechanism provided by modern programming languages. OAC-family of triclustering algorithms shows good parallelisation capabilities due to the independent processing of triples of a triadic formal context. We provide the time and space complexity of ...
Added: November 10, 2020
, , , , in : 2017 IEEE 17th International Conference on Data Mining (ICDM). : New Orleans : IEEE, 2017. Ch. 89. P. 757-762.
A scalable method for mining graph patterns stable under subsampling is proposed. The existing subsample stability and robustness measures are not antimonotonic according to definitions known so far. We study a broader notion of antimonotonicity for graph patterns, so that measures of subsample stability become antimonotonic. Then we propose gSOFIA for mining the most subsample-stable graph patterns. The ...
Added: September 26, 2017
Proceedings of the 7th Spring/Summer Young Researchers’ Colloquium on Software Engineering, SYRCoSE 2013
Kazan : -, 2013
The issue contains the papers presented at the 7th Spring/Summer Young Researchers' Соllоquium оn Software Engineering (SYRCoSE 2013) held in Kazan, Russia on 30th and З1st оf Мay, 2013. Paper selection was based on a competitive peer review process being done by the program committee. Both regular and reseаrсh-in-рrogrеss papers were соnsidered ассeрtable for the ...
Added: June 8, 2013
, Труды Московского физико-технического института 2014 Т. 6 № 3 С. 43-56
Triclustering is an outgrowth of Formal Concept Analysis intented to detect groups of objects with similar properties (clusters) in a context of three sets of entities. In case of social network analysis, for instance, these sets might be users, their interests and events they take part in. Triclustering here can help to detect users with ...
Added: November 8, 2013
Analysis of Images, Social Networks and Texts Third International Conference, AIST 2014, Yekaterinburg, Russia, April 10-12, 2014, Revised Selected Papers
Berlin : Springer, 2014
This book constitutes the proceedings of the Third International Conference on Analysis of Images, Social Networks and Texts, AIST 2014, held in Yekaterinburg, Russia, in April 2014. The 11 full and 10 short papers were carefully reviewed and selected from 74 submissions. They are presented together with 3 short industrial papers, 4 invited papers and ...
Added: November 13, 2014
, , et al., , in : Proceedings of International Conference on Conceptual Structures 2014. Vol. 8577: Graph-Based Representation and Reasoning.: Springer, 2014. P. 287-292.
This paper considers a recommender part of the data anal- ysis system for the collaborative platform Witology. It was developed by the joint research team of the National Research University Higher School of Economics and the Witology company. This recommender sys- tem is able to recommend ideas, like-minded users and antagonists at the respective phases ...
Added: June 9, 2014
, , , , in : Proceedings of the 9th International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI 2021). Vol. 2972.: CEUR-WS, 2021. P. 51-58.
Added: October 28, 2021
Heidelberg : Springer, 2013
This paper comprises papers accepted for presentation at the 14th Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing (RSFDGRC) International Conference which was held as a major part of Joint Rough Set Symposium (JRS 2013) held at Halifax Canada during October 11-14, 2013. ...
Added: October 29, 2013
, Журнал формирующихся направлений науки 2015 Т. 3 № 7
The article presents selected excerpts of the debate, which the doctor of philosophical Sciences, Professor of Moscow state University Yu. Yu. Petrunin. ...
Added: February 23, 2016
, , et al., Springer, 2015
In online social networks, high level features of user behavior such as character traits can be predicted with data from user profiles and their connections. Recent publications use data from online social networks to detect people with depression propensity and diagnosis. In this study, we investigate the capabilities of previously published methods and metrics applied to the Russian online social ...
Added: December 21, 2015
Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data
, , et al., , in : Recent Trends in Analysis of Images, Social Networks and Texts. 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020 Revised Supplementary Proceedings. Vol. 12602.: Springer, 2021. P. 185-204.
© 2021, Springer Nature Switzerland AG.Missing genotypes can affect the efficacy of machine learning approaches to identify the risk genetic variants of common diseases and traits. The problem occurs when genotypic data are collected from different experiments with different DNA microarrays, each being characterised by its pattern of uncalled (missing) genotypes. This can prevent the ...
Added: November 1, 2022
Метод анализа многомерных временных рядов с использованием корректировки предварительно рассчитанной обратной матрицы: исследование в сравнении с другими методами Data Mining
, Бизнес-информатика 2008 № 1 С. 36-44
В ходе анализа многомерных временных рядов применение традиционных статистических методов определяется соблюдением достаточно строгих предпосылок, позволяющих использовать лежащий в основе этих методов МНК. К ним относятся: отсутствие мультиколлинеарности, гетероскедастичности и автокорреляции. В задачах экономического анализа и многомерного прогнозирования с целью уменьшения числа рассматриваемых переменных и быстрого получения приблизительных закономерностей целесообразно прибегнуть к методам интеллектуального анализа ...
Added: September 28, 2012
, , , in : Companion Proceedings 11th International Conference on Learning Analytics & Knowledge (LAK21). : [б.и.], 2021. P. 76-78.
While the exchange of cross-border students in Europe has increased significantly in recent years, a growing number of these students face obstacles in selecting courses for exchange. This poster describes the first iteration of creating a course recommendation system for exchange students to select courses that fit their preferences. We implemented a combination of embedding ...
Added: July 4, 2021
This book constitutes the refereed proceedings of the 10th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2014, held in St. Petersburg, Russia in July 2014. The 40 full papers presented were carefully reviewed and selected from 128 submissions. The topics range from theoretical topics for classification, clustering, association rule and ...
Added: September 30, 2014
Application of Modern Data Analysis Methods to Cluster the Clinical Pathways in Urban Medical Facilities
, , , , in : 2019 IEEE 21st Conference on Business Informatics (CBI). Vol. 1.: M. : IEEE Computer Society, 2019. P. 75-83.
Patient flow modeling in healthcare plays a large role in understanding the operation of the system and its characteristics. Besides, modeling techniques can significantly improve the effectiveness of the medical facilities. The existing level of automation in these facilities enables the accumulation of large amounts of various data. Therefore, the collected data might be considered ...
Added: September 10, 2019