?
Full-text Search in Intermediate Data Storage of FCART
.
The speed of full-text search directly affects the process of text analysis. Search engine creates a text index, which is used for fast full-text search. Solr and ElasticSearch are two popular search engines. A text analysis system requires fast implementing searching and indexing at the same time. This paper describes preprocessing workflow of the analysis system called Formal Concept Analysis Research Toolbox (FCART) and experiment of searching and indexing social networking service data at the same time. Results of the experiment show which search engine is better as the core of FCART search subsystem.
Keywords: softwaredata miningFormal Concept Analysissocial network analysisKnowledge Extractionbig data
Publication based on the results of:
In book
Vol. 1552. , Aachen : CEUR Workshop Proceedings, 2015
Parinov A., Neznanov A., , in : CLA 2016: Proceedings of the Thirteenth International Conference on Concept Lattices and Their Applications. CEUR Workshop Proceedings. Vol. 1624.: M. : Higher School of Economics, National Research University, 2016. P. 285-296.
Formal Concept Analysis (FCA) provides mathematical models, methods and algorithms for data analysis. However, by now there is no easily available program system, which would provide data analyst with unified, intelligible and transparent access to various external data sources with large amount of heterogeneous data for subsequent FCA-based knowledge discovery. The lack of such tools ...
Added: October 19, 2016
Gnatyshak D. V., Ignatov D. I., Kuznetsov S. et al., , in : CLA 2014: Proceedings of the Eleventh International Conference on Concept Lattices and Their Applications. : Kosice : Pavol Jozef Safarik University, 2014. P. 231-242.
An efficient one-pass online algorithm for triclustering of binary data (triadic formal contexts) is proposed. This algorithm is a modified version of the basic algorithm for OAC-triclustering approach, but it has linear time and memory complexities with respect to the cardinality of the underlying ternary relation and can be easily parallelized in order to be ...
Added: October 8, 2014
Chepovskiy A., Орлов А. О., Вестник Новосибирского государственного университета. Серия: Информационные технологии 2017 Т. 15 № 3 С. 64-73
One of the tasks related to the study of the of complex networks is the task of revealing communities structure – splitting all vertices into groups (communities), so that the vertices of each group are more closely related to each other than to the rest of the graph. A popular algorithm for detecting communities is ...
Added: October 8, 2017
Ignatov D. I., Kaminskaya A. Y., Bezzubtseva A. A. et al., , in : Перспективные направления исследований в области бизнес-информатики: Материалы XI международной конференции. : Nizhny Novgorod : Higher School of Economics in Nizhny Novgorod, 2012. P. 7-17.
In a crowdsourcing project several participants discuss and solve one common problem, propose their ideas, evaluate ideas of each other, etc. We propose the novel instrument CrowDM for analyzing data generated by collaborative platforms. The initial version of the system combines several innovative techniques for structured and unstructured data analysis. Formal Concept Analysis, multimodal clustering ...
Added: December 3, 2012
Egurnov D., Точилкин Д. С., Ignatov D. I., , in : Complex Data Analytics with Formal Concept Analysis. : Springer, 2022. P. 239-258.
In this paper, we describe versions of triclustering algorithms adapted for efficient calculations in distributed environments with MapReduce model or parallelisation mechanism provided by modern programming languages. OAC-family of triclustering algorithms shows good parallelisation capabilities due to the independent processing of triples of a triadic formal context. We provide time and space complexity of the ...
Added: November 1, 2022
Clermont-Ferrand : CEUR Workshop Proceedings, 2015
Formal Concept Analysis is a method of analysis of logical data based on formalization of conceptual knowledge by means of lattice theory. It has proved to be of interest to various applied fields such as data visualization, knowledge discovery and data mining, database theory, and many others. The International Conference “Concept Lattices and Their Applications ...
Added: October 22, 2015
Neznanov A., Parinov A., , in : Artificial Intelligence: Methodology, Systems, and Applications 16th International Conference, AIMSA 2014, Varna, Bulgaria, September 11-13, 2014. Proceedings. Vol. 8722.: Dordrecht, L., Cham, Heidelberg, NY : Springer, 2014. P. 214-221.
Formal Concept Analysis Research Toolbox (FCART) is an integrated environment for knowledge and data engineers with a set of research tools based on Formal Concept Analysis. FCART allows a user to load structured and unstructured data (including texts with various metadata) from heterogeneous data sources into local data storage, compose scaling queries for data snapshots, and then ...
Added: October 14, 2014
Ignatov D. I., Kaminskaya A. Y., Malioukov A. et al., , in : Proceedings of International Conference on Conceptual Structures 2014. Vol. 8577: Graph-Based Representation and Reasoning.: Springer, 2014. P. 287-292.
This paper considers a recommender part of the data anal- ysis system for the collaborative platform Witology. It was developed by the joint research team of the National Research University Higher School of Economics and the Witology company. This recommender sys- tem is able to recommend ideas, like-minded users and antagonists at the respective phases ...
Added: June 9, 2014
Semenov A., Natekin A., Nikolenko S. I. et al., , in : Analysis of Images, Social Networks and Texts. 4th International Conference, AIST 2015, Yekaterinburg, Russia, April 9–11, 2015, Revised Selected Papers. Vol. 542: Series: Communications in Computer and Information Science.: Switzerland : Springer, 2015. Ch. 3. P. 24-35.
In online social networks, high level features of user behavior such as character traits can be predicted with data from user profiles and their connections. Recent publications use data from online social networks to detect people with depression propensity and diagnosis. In this study, we investigate the capabilities of previously published methods and metrics applied ...
Added: October 28, 2015
CEUR-WS.org, 2020
The CLA conference is an international forum for researchers, practitioners and students dedicated to the practice of Formal Concept Analysis (FCA) and areas closely related to it, including data analysis and mining, information retrieval, knowledge management, knowledge engineering, logic, algebra and lattice theory.
The 15th of CLA, CLA 2020, was going to be held in Tallinn, Estonia ...
Added: October 30, 2020
Domenach F., Ignatov D. I., Poelmans J., Berlin, Heidelberg : Springer, 2012
This book constitutes the refereed proceedings of the 10th International Conference on Formal Concept Analysis, ICFCA 2012, held in Leuven, Belgium in May 2012. The 20 revised full papers presented together with 6 invited talks were carefully reviewed and selected from 68 submissions. The topics covered in this volume range from recent advances in machine ...
Added: December 3, 2012
Semenov A., Natekin A., Nikolenko S. I. et al., Springer, 2015
In online social networks, high level features of user behavior such as character traits can be predicted with data from user profiles and their connections. Recent publications use data from online social networks to detect people with depression propensity and diagnosis. In this study, we investigate the capabilities of previously published methods and metrics applied to the Russian online social ...
Added: December 21, 2015
Switzerland : Springer, 2019
This book constitutes the refereed proceedings of the 11th International Conference on Intelligent Data Processing, IDP 2016, held in Barcelona, Spain, in October 2016.
The 11 revised full papers were carefully reviewed and selected from 52 submissions. The papers of this volume are organized in topical sections on machine learning theory with applications; intelligent data processing in life ...
Added: February 8, 2020
Ignatov D. I., Khvorykh G., Khrunin A. et al., , in : Recent Trends in Analysis of Images, Social Networks and Texts. 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020 Revised Supplementary Proceedings. Vol. 12602.: Springer, 2021. P. 185-204.
© 2021, Springer Nature Switzerland AG.Missing genotypes can affect the efficacy of machine learning approaches to identify the risk genetic variants of common diseases and traits. The problem occurs when genotypic data are collected from different experiments with different DNA microarrays, each being characterised by its pattern of uncalled (missing) genotypes. This can prevent the ...
Added: November 1, 2022
Springer, 2014
This book constitutes the refereed proceedings of the 10th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2014, held in St. Petersburg, Russia in July 2014. The 40 full papers presented were carefully reviewed and selected from 128 submissions. The topics range from theoretical topics for classification, clustering, association rule and ...
Added: September 30, 2014
Berlin : Springer, 2014
This book constitutes the proceedings of the Third International Conference on Analysis of Images, Social Networks and Texts, AIST 2014, held in Yekaterinburg, Russia, in April 2014. The 11 full and 10 short papers were carefully reviewed and selected from 74 submissions. They are presented together with 3 short industrial papers, 4 invited papers and ...
Added: November 13, 2014
Chepovskiy A., Лобанова С. Ю., Бизнес-информатика 2017 Т. 42 № 4 С. 64-73
In this paper, we propose and implement a method for detecting intersecting and nested communities in graphs of interacting objects of different natures. For this, two classical algorithms are taken: a hierarchical agglomerate and one based on the search for k-cliques. The combined algorithm presented is based on their consistent application. In addition, parametric options ...
Added: December 10, 2017
Ignatov D. I., Egurnov D., Точилкин Д. С., , in : Supplementary Proceedings ICFCA 2019 Conference and Workshops. Vol. 2378.: CEUR Workshop Proceedings, 2019. P. 137-151.
This paper presents further development of distributed multimodal clustering. We introduce a new version of multimodal clustering algorithm for distributed processing in Apache Hadoop on computer clusters. Its implementation allows a user to conduct clustering on data with modality greater than two. We provide time and space complexity of the algorithm and justify its relevance. ...
Added: October 31, 2019
Коломейченко М. И., Polyakov I. V., Chepovskiy A. et al., В кн. : Труды Международной научной конференции по физико-технической информатике (CPT2015). : М., Протвино : Институт физико-технической информатики, 2016. С. 175-178.
Рассматривается задача анализа графа социальной сети. Представлена специализированная структура данных, предназначенная для хранения и обработки графов социальных сетей больших объемов. Предложена архитектура хранилища графа социальной сети больших объемов. ...
Added: May 11, 2016
M. : -, 2017
Added: October 27, 2017
Орлов А. О., Chepovskiy A., В кн. : Труды Международной научной конференции Московского физико-технического института (государственного университета) и Института физико-технической информатики (SCVRT1516). : М., Протвино : Институт физико-технической информатики, 2016. С. 124-129.
This paper describes the problem of social network graph analysis. Features of communities revealed by Blondel algorithm are studied. Examples of real data from one of the social networks are considered. On the basis of the found properties of the algorithm its modification is proposed. ...
Added: November 20, 2016
Ignatov D. I., Kaminskaya A. Y., Bezzubtseva A. A. et al., В кн. : Анализ изображений, сетей и текстов. Доклады всероссийской научной конференции АИСТ'12. Модели, алгоритмы и инструменты анализа данных; результаты и возможности для анализа изображений, сетей и текстов. Екатеринбург, 16 – 18 марта 2012 года. Вып. 1.: М. : Национальный открытый университет «ИНТУИТ», 2012. С. 16-26.
В работе описывается система анализа данных кол
лаборативной платформы компании Witology. Проект находится
в состоянии разработки, поэтому в статье отражены в основном
методологические аспекты и результаты первых экспериментов.
В основу системы положен ряд моделей и методов современного
анализа объектно-признаковых и неструктурированных данных
(текстов), таких как Анализ Формальных Понятий, мультимо
дальная кластеризация, поиск ассоциативных правил и извлече
ние ключевых словосочетаний и слов из текстов. ...
Added: January 30, 2013
Buzmakov A., Neznanov A., , in : Proceedings of the International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI at IJCAI 2013). Issue 1058.: Beijing : CEUR Workshop Proceedings, 2013. Ch. 7. P. 49-56.
A new general and efficient architecture for working with pattern structures, an extension of FCA for dealing with “complex” descriptions, is introduced and implemented in a subsystem of Formal Concept Analysis Research Toolbox (FCART). The architecture is universal in terms of possible dataset structures and formats, techniques of pattern structure manipulation. ...
Added: October 26, 2014
M. : Higher School of Economics Publishing House, 2011
Concept discovery is a Knowledge Discovery in Databases (KDD) research field that uses human-centered techniques such as Formal Concept Analysis (FCA), Biclustering, Triclustering, Conceptual Graphs etc. for gaining insight into the underlying conceptual structure of the data. Traditional machine learning techniques are mainly focusing on structured data whereas most data available resides in unstructured, often ...
Added: December 3, 2012