Full-text Search in Intermediate Data Storage of FCART
The speed of full-text search directly affects the process of text analysis. Search engine creates a text index, which is used for fast full-text search. Solr and ElasticSearch are two popular search engines. A text analysis system requires fast implementing searching and indexing at the same time. This paper describes preprocessing workflow of the analysis system called Formal Concept Analysis Research Toolbox (FCART) and experiment of searching and indexing social networking service data at the same time. Results of the experiment show which search engine is better as the core of FCART search subsystem.
In this paper we propose two novel methods for analyzing data collected from online social networks. In particular we will do analyses on Vkontake data (Russian online social network). Using biclustering we extract groups of users with similar interests and find communities of users which belong to similar groups. With triclustering we reveal users’ interests as tags and use them to describe Vkontakte groups. After this social tagging process we can recommend to a particular user relevant groups to join or new friends from interesting groups which have a similar taste. We present some preliminary results and explain how we are going to apply these methods on massive data repositories.
The paper makes a brief introduction into multiple classifier systems and describes a particular algorithm which improves classification accuracy by making a recommendation of an algorithm to an object. This recommendation is done under a hypothesis that a classifier is likely to predict the label of the object correctly if it has correctly classified its neighbors. The process of assigning a classifier to each object involves here the apparatus of Formal Concept Analysis. We explain the principle of the algorithm on a toy example and describe experiments with real-world datasets.
Creature questions of the mathematical, informational and programming support for the practice network community control are considered. Community mathematical model is proposed. This model utilization as basis for realization of the control system principal functions in the context of the participant interaction improvement and object community regions forming is considered.
This book constitutes the second part of the refereed proceedings of the 10th International Conference on Formal Concept Analysis, ICFCA 2012, held in Leuven, Belgium in May 2012. The topics covered in this volume range from recent advances in machine learning and data mining; mining terrorist networks and revealing criminals; concept-based process mining; to scalability issues in FCA and rough sets.
In 2015-2016 the Department of Communication, Media and Design of the National Research University “Higher School of Economics” in collaboration with non-profit organization ROCIT conducted research aimed to construct the Index of Digital Literacy in Russian Regions. This research was the priority and remain unmatched for the momentIn 2015-2016 the Department of Communication, Media and Design of the National Research University “Higher School of Economics” in collaboration with non-profit organization ROCIT conducted research aimed to construct the Index of Digital Literacy in Russian Regions. This research was the priority and remain unmatched for the moment
Formal Concept Analysis (FCA) is an unsupervised clustering technique and many scientific papers are devoted to applying FCA in Information Retrieval (IR) research. We collected 103 papers published between 2003-2009 which mention FCA and information retrieval in the abstract, title or keywords. Using a prototype of our FCA-based toolset CORDIET, we converted the pdf-files containing the papers to plain text, indexed them with Lucene using a thesaurus containing terms related to FCA research and then created the concept lattice shown in this paper. We visualized, analyzed and explored the literature with concept lattices and discovered multiple interesting research streams in IR of which we give an extensive overview. The core contributions of this paper are the innovative application of FCA to the text mining of scientific papers and the survey of the FCA-based IR research.
Concept discovery is a Knowledge Discovery in Databases (KDD) research field that uses human-centered techniques such as Formal Concept Analysis (FCA), Biclustering, Triclustering, Conceptual Graphs etc. for gaining insight into the underlying conceptual structure of the data. Traditional machine learning techniques are mainly focusing on structured data whereas most data available resides in unstructured, often textual, form. Compared to traditional data mining techniques, human-centered instruments actively engage the domain expert in the discovery process. This volume contains the contributions to CDUD 2011, the International Workshop on Concept Discovery in Unstructured Data (CDUD) held in Moscow. The main goal of this workshop was to provide a forum for researchers and developers of data mining instruments working on issues with analyzing unstructured data. We are proud that we could welcome 13 valuable contributions to this volume. The majority of the accepted papers described innovative research on data discovery in unstructured texts. Authors worked on issues such as transforming unstructured into structured information by amongst others extracting keywords and opinion words from texts with Natural Language Processing methods. Multiple authors who participated in the workshop used methods from the conceptual structures field including Formal Concept Analysis and Conceptual Graphs. Applications include but are not limited to text mining police reports, sociological definitions, movie reviews, etc.