Triclustering in Big Data Setting
In this paper, we describe versions of triclustering algorithms adapted for efficient calculations in distributed environments with MapReduce model or parallelisation mechanism provided by modern programming languages. OAC-family of triclustering algorithms shows good parallelisation capabilities due to the independent processing of triples of a triadic formal context. We provide the time and space complexity of the algorithms and justify their relevance. We also compare performance gain from using a distributed system and scalability.
Труды Института системного программирования РАН 2012 Т. 23 С. 143-158
Big data challenged traditional storage and analysis systems in several new ways. In this paper we try to figure out how to overcome this challenges, why it's not possible to make it efficiently and describe three modern approaches to big data handling: NoSQL, MapReduce and real-time stream processing. The first section of the paper is ...
Added: October 31, 2017
Proceedings of the Institute for System Programming of the RAS 2016 Vol. 28 No. 3 P. 103-122
Process mining is a relatively new research field, offering methods of business processes analysis and improvement, which are based on studying their execution history (event logs). Conformance checking is one of the main sub-fields of process mining. Conformance checking algorithms are aimed to assess how well a given process model, typically represented by a Petri ...
Added: September 12, 2016
, in: Supplementary Proceedings ICFCA 2019 Conference and Workshops. Vol. 2378.: CEUR Workshop Proceedings, 2019.. P. 137-151.
This paper presents further development of distributed multimodal clustering. We introduce a new version of multimodal clustering algorithm for distributed processing in Apache Hadoop on computer clusters. Its implementation allows a user to conduct clustering on data with modality greater than two. We provide time and space complexity of the algorithm and justify its relevance. ...
Added: October 31, 2019
CEUR Workshop Proceedings, 2019
Added: October 31, 2019
Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data
Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data / . 2020.
Missing genotypes can affect the effcacy of machine learning approaches to identify the risk genetic variants of common diseases and traits. The problem occurs when genotypic data are collected from different experiments with different DNA microarrays, each being characterised by its pattern of uncalled (missing) genotypes. This can prevent the machine learning classifier from assigning ...
Added: November 10, 2020
International Journal of General Systems 2013 Vol. 42 No. 6 P. 572-593
formal concept analysis, data mining, triclustering, three-way data, folksonomy, spectral triclustering ...
Added: October 16, 2013
Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications Clermont-Ferrand, France, October 13-16, 2015
Clermont-Ferrand: CEUR Workshop Proceedings, 2015
Formal Concept Analysis is a method of analysis of logical data based on formalization of conceptual knowledge by means of lattice theory. It has proved to be of interest to various applied fields such as data visualization, knowledge discovery and data mining, database theory, and many others. The International Conference “Concept Lattices and Their Applications ...
Added: October 22, 2015
Вестник Новосибирского государственного университета. Серия: Информационные технологии 2013 Т. 11 № 4 С. 77-83
In this paper special data structure for big social graph storing and operating is presented. We discuss mainly graph paths searching, obtaining subgrapths and addition of new edges and vertices. ...
Added: October 17, 2013
Открытые системы. СУБД 2013 № 2 С. 48-51
The issues of Big Data begin to touch upon transactional systems despite the fact that they contain orders of magnitude less data than some others do. Yet, today they process vast amount of information and transactions requiring approaches that ensure robust scalability. Let’s consider the types of scalability suitable for transactional domain, the issues specific ...
Added: January 30, 2018
Труды Института системного программирования РАН 2013 Т. 24 С. 327-258
Many modern applications (such as large-scale Web-sites, social networks, research projects, business analytics, etc.) have to deal with very large data volumes (also referred to as “big data”) and high read/write loads. These applications require underlying data management systems to scale well in order to accommodate data growth and increasing workloads. High throughput, low latencies ...
Added: January 30, 2018
Математическая биология и биоинформатика 2017 Т. 12 № 1 С. 102-119
Секвенирование человеческого генома началось в 1994 году. Понадобилось 10 лет работы многих научных коллективов для того, чтобы получить черновую последовательность ДНК человека. Современные технологии секвенирования позволяют получать геном конкретного человека за несколько дней. Обсуждаются успехи современной биоинформатики, связанные с появлением высокопроизводительных платформ секвенирования, которые не только способствовали расширению возможностей различных направлений биологии и других смежных ...
Added: March 3, 2017
Radio Physics and Radio Astronomy 2017 Т. 22 № 4 С. 270-275
In the process of astronomical observations are collected vast amounts of data. BSA (Big Scanning Antenna) LPI used in the study of impulse phenomena, daily logs 87.5 GB of data (32 TB per year). Experts classified 83096 individual observations (on the segment of the study July 2012 - October 2013). Over 75% of the sample ...
Added: October 15, 2017
Intelligent Data Processing 11th International Conference, IDP 2016, Barcelona, Spain, October 10–14, 2016, Revised Selected Papers
Switzerland: Springer, 2019
This book constitutes the refereed proceedings of the 11th International Conference on Intelligent Data Processing, IDP 2016, held in Barcelona, Spain, in October 2016. The 11 revised full papers were carefully reviewed and selected from 52 submissions. The papers of this volume are organized in topical sections on machine learning theory with applications; intelligent data processing in life ...
Added: February 8, 2020
Большие данные и их приложения в электроэнергетике: от бизнес аналитики до виртуальных электростанций
М.: Нобель Пресс, 2014
Предназначена для студентов и специалистов в области разработки информационных систем в том числе для электроэнергетики и руководителей ИТ подразделений предприятий, всем, кто работает над планированием направлений развития электроэнергетики и просто интересуется прогресcом в этой области В книге рассматривается направление в области обработки данных, получившее название Большие Данные (Big Data), рассказывается о техниках и технологиях. Главный фокус ...
Added: October 10, 2015
Hershey: IGI Global, 2012
The consideration of symbolic machine learning algorithms as an entire class will make it possible, in the future, to generate algorithms, with the aid of some parameters, depending on the initial users’ requirements and the quality of solving targeted problems in domain applications. Diagnostic Test Approaches to Machine Learning and Commonsense Reasoning Systems surveys, analyzes, and ...
Added: December 3, 2012
Статистика и Экономика 2018 Т. 15 № 2 С. 30-37
The article includes the observation of the cluster analysis of medical data on the example of the cardiac data. One of the main effective and commonly used Data Mining methods that applied to the large amounts of information (for example, mathematical economics) are clustering methods: the search for signs of similarity between objects in the study of the subject area ...
Added: May 29, 2018
Труды ХVIII международной конференции DAMDID / RSDL’2016, 11-14 октября 2016, Ершово, Московская область, Россия
НИЯУ МИФИ, 2016
In 2016 the International Conference “Data Analytics and Management in Data Intensive Domains” (DAMDID/RCDL’2016) was held on October 11 – 14 in the Holiday Center, Ershovo (Moscow region). By tradition the “Data Analytics and Management in Data Intensive Domains” conference (DAMDID) is planned as a multidisciplinary forum of researchers and practitioners from various domains of science and research, promoting ...
Added: January 26, 2017
Датчики и системы 2018 № 5 С. 32-38
This work considers the problem of designing the architecture of a network management system for a generic module of a modern automated building. To improve the efficiency of building operation given the large influx of data, the architecture of the network management system implements multicontour management of a generic modules using cloud scenarios. Building operation ...
Added: July 19, 2018
Синтез информационной системы управления подсистемами технического обеспечения интеллектуальных зданий
Вестник Московского государственного строительного университета 2017 Т. 12 № 10 С. 1191-1201
Subject: smart house maintenance requires taking into account a number of factors - resource conservation, mitigating working expenditures, safety enhancement, ensuring comfort of leisure and operation. Automation of such engineering systems networks as illumination, climate control, security and communication, may be achieved through utilization of contemporary technologies (e.g. IoT – Internet of Things). However, storing ...
Added: November 21, 2017
Chelyabinsk: IEEE, 2018
The 2018 Global Smart Industry Conference is organized in order to exchange experience, promote discussion and presentation of research papers, and summarize results in development of innovative models, methods and technologies for the digital industry in universities, scientific and industrial associations of the Russian Federation as well as in foreign companies, and the experience of ...
Added: November 25, 2019
, in: Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications Clermont-Ferrand, France, October 13-16, 2015. Vol. 1466.: Clermont-Ferrand: CEUR Workshop Proceedings, 2015.. P. 47-58.
In our previous work an efficient one-pass online algorithm for triclustering of binary data (triadic formal contexts) was proposed. This algorithm is a modified version of the basic algorithm for OAC-triclustering approach; it has linear time and memory complexities. In this paper we parallelise it via map-reduce framework in order to make it suitable for big datasets. The results of ...
Added: October 23, 2015
Formal Concept Analysis: 16th International Conference, ICFCA 2021, Strasbourg, France, June 29 – July 2, 2021, Proceedings
This book constitutes the proceedings of the 16th International Conference on Formal Concept Analysis, ICFCA 2021, held in Strasbourg, France, in June/July 2021. The 14 full papers and 5 short papers presented in this volume were carefully reviewed and selected from 32 submissions. The book also contains four invited contributions in full paper length. The research part ...
Added: July 10, 2021
Моделирование образовательных процессов и их оптимизация на примере модели работы с электронными образовательными ресурсами
Информационные технологии 2015
This study investigates main problems of automation and optimization of educational processes with the help of BPMS and Big Data. The questions concerning process modeling are raised, particularly related to the integration of process-oriented and business analysis systems. The main goal of study is to find possible new way to implement the ideas of metadata ...
Added: October 9, 2015
International Journal of General Systems 2016 Vol. 45 No. 2 P. 135-159
Nowadays data-sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of “complex” sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of ...
Added: February 25, 2016