Triclustering in Big Data Setting

?

Triclustering in Big Data Setting

2020.

Egurnov D., Ignatov D. I., Точилкин Д. С.

In press

In this paper, we describe versions of triclustering algorithms adapted for efficient calculations in distributed environments with MapReduce model or parallelisation mechanism provided by modern programming languages. OAC-family of triclustering algorithms shows good parallelisation capabilities due to the independent processing of triples of a triadic formal context. We provide the time and space complexity of the algorithms and justify their relevance. We also compare performance gain from using a distributed system and scalability.

Research target: Computer Science

Priority areas: IT and mathematics

Language: English

Full text

Text on another site

Multimodal Clustering of Boolean Tensors on MapReduce: Experiments Revisited

Ignatov D. I., Egurnov D., Точилкин Д. С., , in: Supplementary Proceedings ICFCA 2019 Conference and WorkshopsVol. 2378.: CEUR Workshop Proceedings, 2019. P. 137–151.

This paper presents further development of distributed multimodal clustering. We introduce a new version of multimodal clustering algorithm for distributed processing in Apache Hadoop on computer clusters. Its implementation allows a user to conduct clustering on data with modality greater than two. We provide time and space complexity of the algorithm and justify its relevance. ...

Added: October 31, 2019

Большие данные: современные подходы к хранению и обработке

Клеменков П. А., Kuznetsov S. D., Труды Института системного программирования РАН 2012 Т. 23 С. 143–158

Big data challenged traditional storage and analysis systems in several new ways. In this paper we try to figure out how to overcome this challenges, why it's not possible to make it efficiently and describe three modern approaches to big data handling: NoSQL, MapReduce and real-time stream processing. The first section of the paper is ...

Added: October 31, 2017

Applying MapReduce to Conformance Checking

Shugurov I., Mitsyuk A. A., Proceedings of the Institute for System Programming of the RAS 2016 Vol. 28 No. 3 P. 103–122

Process mining is a relatively new research field, offering methods of business processes analysis and improvement, which are based on studying their execution history (event logs). Conformance checking is one of the main sub-fields of process mining. Conformance checking algorithms are aimed to assess how well a given process model, typically represented by a Petri ...

Added: September 12, 2016

Supplementary Proceedings ICFCA 2019 Conference and Workshops

CEUR Workshop Proceedings, 2019.

Added: October 31, 2019

Can triconcepts become triclusters?

Ignatov D. I., Kuznetsov S., Zhukov L. E. et al., International Journal of General Systems 2013 Vol. 42 No. 6 P. 572–593

formal concept analysis, data mining, triclustering, three-way data, folksonomy, spectral triclustering ...

Added: October 16, 2013

Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data

Ignatov D. I., Khvorykh G. V., Khrunin A. V. et al., / Series LNCS "Lecture Notes in Computer Science". 2020.

Missing genotypes can affect the effcacy of machine learning approaches to identify the risk genetic variants of common diseases and traits. The problem occurs when genotypic data are collected from different experiments with different DNA microarrays, each being characterised by its pattern of uncalled (missing) genotypes. This can prevent the machine learning classifier from assigning ...

Added: November 10, 2020

On mining complex sequential data by means of FCA and pattern structures

Buzmakov A. V., Egho E., Jay N. et al., International Journal of General Systems 2016 Vol. 45 No. 2 P. 135–159

Nowadays data-sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of “complex” sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of ...

Added: February 25, 2016

Хранение и обработка графа социальных сетей

Polyakov I. V., Chepovskiy A., Chepovskiy A., Вестник Новосибирского государственного университета. Серия: Информационные технологии 2013 Т. 11 № 4 С. 77–83

In this paper special data structure for big social graph storing and operating is presented. We discuss mainly graph paths searching, obtaining subgrapths and addition of new edges and vertices. ...

Added: October 17, 2013

Proceedings 2018 Global Smart Industry Conference (GloSIC)

Chelyabinsk: IEEE, 2018.

The 2018 Global Smart Industry Conference is organized in order to exchange experience, promote discussion and presentation of research papers, and summarize results in development of innovative models, methods and technologies for the digital industry in universities, scientific and industrial associations of the Russian Federation as well as in foreign companies, and the experience of ...

Added: November 25, 2019

14th International Conference on Formal Concept Analysis - Supplementary Proceedings

University Rennes 1, 2017.

This volume is the supplementary volume of the 14th International Conference on Formal Concept Analysis (ICFCA 2017), held from June 13th to 16th 2017, at IRISA, Rennes. The ICFCA conference series is one of the major venues for researches from the field of Formal Concept Analysis and related areas to present and discuss their recent ...

Added: June 19, 2017

Труды ХVIII международной конференции DAMDID / RSDL’2016, 11-14 октября 2016, Ершово, Московская область, Россия

НИЯУ МИФИ, 2016.

In 2016 the International Conference “Data Analytics and Management in Data Intensive Domains” (DAMDID/RCDL’2016) was held on October 11 – 14 in the Holiday Center, Ershovo (Moscow region). By tradition the “Data Analytics and Management in Data Intensive Domains” conference (DAMDID) is planned as a multidisciplinary forum of researchers and practitioners from various domains of science and research, promoting ...

Added: January 26, 2017

Синтез информационной системы управления подсистемами технического обеспечения интеллектуальных зданий

Vikentyeva O., Deryabin A. I., Shestakova L. V. et al., Вестник Московского государственного строительного университета 2017 Т. 12 № 10 С. 1191–1201

Subject: smart house maintenance requires taking into account a number of factors - resource conservation, mitigating working expenditures, safety enhancement, ensuring comfort of leisure and operation. Automation of such engineering systems networks as illumination, climate control, security and communication, may be achieved through utilization of contemporary technologies (e.g. IoT – Internet of Things). However, storing ...

Added: November 21, 2017

Архитектура сетевого управляющего комплекса здания на базе IoT устройств

Vikentyeva O., Kychkin A., Deryabin A. I. et al., Датчики и системы 2018 № 5 С. 32–38

This work considers the problem of designing the architecture of a network management system for a generic module of a modern automated building. To improve the efficiency of building operation given the large influx of data, the architecture of the network management system implements multicontour management of a generic modules using cloud scenarios. Building operation ...

Added: July 19, 2018

Кластерный анализ кардиологических данных

Зимина Е. Ю., Статистика и Экономика 2018 Т. 15 № 2 С. 30–37

The article includes the observation of the cluster analysis of medical data on the example of the cardiac data. One of the main effective and commonly used Data Mining methods that applied to the large amounts of information (for example, mathematical economics) are clustering methods: the search for signs of similarity between objects in the study of the subject area ...

Added: May 29, 2018

Однопроходный алгоритм трикластеризации

Гнатышак Д. В., Научно-техническая информация. Серия 2: Информационные процессы и системы 2015 № 2 С. 16–30

В связи с продолжающимся ростом популярности области больших данных все более активно ставится вопрос о создании эффективных алгоритмов с низкой временной сложностью и возможностью параллелизации. Целью данной работы было создание эффективного однопроходного алгоритма трикластеризации бинарных данных, пригодного для использования в области больших данных. В результате был получен однопроходный линейный онлайн-алгоритм OAC-трикластеризации (трикластеризации объект-признак-условие). Помимо того, ...

Added: April 15, 2015

Исследование и определение признаков скрытых атак на предприятии для алгоритмов машинного обучения

Золотухина М. А., Zykov S. V., Вестник Российского нового университета 2023 № 1 С. 20–28

Зачастую именно человеческий фактор ведет к распространению угроз на предприятиях. Если техническое устройство представляет собой четко работающий и слаженный механизм с возможностью при помощи диагностического оборудования проводить замеры параметров неисправностей и устранять их, то для исследования скрытых атак необходим новый компонент системы. Предприятия и промышленность в целом нуждаются в интеллектуальной системе защиты и обнаружения скрытых ...

Added: April 11, 2023

ПРИМЕНЕНИЕ ГЛУБОКИХ НЕЙРОННЫХ СЕТЕЙ ДЛЯ КЛАССИФИКАЦИИ БОЛЬШИХ ОБЪЕМОВ АСТРОНОМИЧЕСКИХ ДАННЫХ

Gorbunov A. A., Isaev E., Samodurov V., Radio Physics and Radio Astronomy 2017 Т. 22 № 4 С. 270–275

In the process of astronomical observations are collected vast amounts of data. BSA (Big Scanning Antenna) LPI used in the study of impulse phenomena, daily logs 87.5 GB of data (32 TB per year). Experts classified 83096 individual observations (on the segment of the study July 2012 - October 2013). Over 75% of the sample ...

Added: October 15, 2017

Diagnostic Test Approaches to Machine Learning and Commonsense Reasoning Systems

Naidenova X., Ignatov D. I., Hershey: IGI Global, 2012.

The consideration of symbolic machine learning algorithms as an entire class will make it possible, in the future, to generate algorithms, with the aid of some parameters, depending on the initial users’ requirements and the quality of solving targeted problems in domain applications. Diagnostic Test Approaches to Machine Learning and Commonsense Reasoning Systems surveys, analyzes, and ...

Added: December 3, 2012

Пусть расцветают сто цветов

Kuznetsov S. D., Открытые системы. СУБД 2013 № 2 С. 48–51

The issues of Big Data begin to touch upon transactional systems despite the fact that they contain orders of magnitude less data than some others do. Yet, today they process vast amount of information and transactions requiring approaches that ensure robust scalability. Let’s consider the types of scalability suitable for transactional domain, the issues specific ...

Added: January 30, 2018

Большие данные и их приложения в электроэнергетике: от бизнес аналитики до виртуальных электростанций

Krylov V., Крылов С. В., М.: Нобель Пресс, 2014.

Предназначена для студентов и специалистов в области разработки информационных систем в том числе для электроэнергетики и руководителей ИТ подразделений предприятий, всем, кто работает над планированием направлений развития электроэнергетики и просто интересуется прогресcом в этой области В книге рассматривается направление в области обработки данных, получившее название Большие Данные (Big Data), рассказывается о техниках и технологиях. Главный фокус ...

Added: October 10, 2015

Моделирование образовательных процессов и их оптимизация на примере модели работы с электронными образовательными ресурсами

Прокофьев Д. О., Starykh V., Информационные технологии 2015

This study investigates main problems of automation and optimization of educational processes with the help of BPMS and Big Data. The questions concerning process modeling are raised, particularly related to the integration of process-oriented and business analysis systems. The main goal of study is to find possible new way to implement the ideas of metadata ...

Added: October 9, 2015

Распределенные горизонтально масштабируемые решения для управления данными

С.Д. Кузнецов, Посконин А. В., Труды Института системного программирования РАН 2013 Т. 24 С. 327–258

Many modern applications (such as large-scale Web-sites, social networks, research projects, business analytics, etc.) have to deal with very large data volumes (also referred to as “big data”) and high read/write loads. These applications require underlying data management systems to scale well in order to accommodate data growth and increasing workloads. High throughput, low latencies ...

Added: January 30, 2018

Formal Concept Analysis: 16th International Conference, ICFCA 2021, Strasbourg, France, June 29 – July 2, 2021, Proceedings

Springer, 2021.

This book constitutes the proceedings of the 16th International Conference on Formal Concept Analysis, ICFCA 2021, held in Strasbourg, France, in June/July 2021. The 14 full papers and 5 short papers presented in this volume were carefully reviewed and selected from 32 submissions. The book also contains four invited contributions in full paper length. The research part ...

Added: July 10, 2021

Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications Clermont-Ferrand, France, October 13-16, 2015

Clermont-Ferrand: CEUR Workshop Proceedings, 2015.

Formal Concept Analysis is a method of analysis of logical data based on formalization of conceptual knowledge by means of lattice theory. It has proved to be of interest to various applied fields such as data visualization, knowledge discovery and data mining, database theory, and many others. The International Conference “Concept Lattices and Their Applications ...

Added: October 22, 2015