Proceedings of the Workshop Formal Concept Analysis Meets Information Retrieval
We describe FCART software system, a universal integrated environment for knowledge and data engineers with a set of research tools based on Formal Concept Analysis. The system is intended for knowledge discovery from big dynamic data collections, including text collections. FCART allows the user to load structured and unstructured data (texts and various metainformation) from heterogeneous data sources, build data snapshots, compose queries, generate and visualize concept lattices, clusters, attribute dependencies, and other useful analytical artifacts. Full preprocessing scenario is considered.
A new approach for detecting duplicates in ontology built on real redundant data is considered. This approach is based on transforma- tion of initial ontology into a formal context and processing this context using methods of Formal Concept Analysis (FCA). As a part of a new method we also introduce a new index for measuring similarity between objects in formal concept. We study the new approach on randomly gen- erated contexts and real ontology built for a collection of political news and documents.
We develop a graph representation and learning technique for parse structures for paragraphs of text. We introduce Parse Thicket (PT) as a sum of syntactic parse trees augmented by a number of arcs for inter-sentence word-word relations such as co-reference and taxonomic relations. These arcs are also derived from other sources, including Speech Act and Rhetoric Structure theories. The operation of generalizing logical formulas is extended towards parse trees and then towards parse thickets to compute similarity between texts. We provide a detailed illustration of how PTs are built from parse trees, and generalized. The proposed approach is subject to preliminary evaluation in the product search domain of eBay.com, where user queries include product names, features and expressions for user needs, and query keywords occur in different sentences of an answer. We demonstrate that search relevance is improved by PT generalization.