A novel method for evaluating classification reliability is proposed based on the discernibility of a pattern’s class against other classes from the pattern’s location. Use of three measures of discernibility is experimentally compared with conventional techniques based on the classification scores for class labels. The classification accuracy can be drastically enhanced through discernibility measures by using the most reliable – “elite” – patterns. It can be further boosted by forming an amalgamation of the elites of different classifiers. Improved performance is achieved at the price of rejecting many patterns. There are situations where this price is worth paying – when the non-reliable accuracy rates lead to the need in manually testing of very complex technical devices or in diagnostics of human diseases. Contrary to conventional techniques for estimating reliability, the proposed measures are applicable on small datasets as well as on datasets with complex class structures where conventional classifiers show low accuracy rates.
Formal Concept Analysis (FCA) is a mathematical technique that has been extensively applied to Boolean data in knowledge discovery, information retrieval, web mining, etc. applications. During the past years, the research on extending FCA theory to cope with imprecise and incomplete information made significant progress. In this paper, we give a systematic overview of the more than 120 papers published between 2003 and 2011 on FCA with fuzzy attributes and rough FCA. We applied traditional FCA as a text-mining instrument to 1072 papers mentioning FCA in the abstract. These papers were formatted in pdf files and using a thesaurus with terms referring to research topics, we transformed them into concept lattices. These lattices were used to analyze and explore the most prominent research topics within the FCA with fuzzy attributes and rough FCA research communities. FCA turned out to be an ideal metatechnique for representing large volumes of unstructured texts.
The paper is devoted to the investigation of imprecision indices. They are used for evaluating imprecision (or non-specificity) contained in information described by monotone (non-additive) measures. These indices can be considered as generalizations of the generalized Hartley measure. We argue that in some cases, for example in approximation problems, the application of imprecision indices is well justified comparing with well-known uncertainty measures because of their good sensitivity. In the paper, we investigate properties of so called linear imprecision indices; in particular, we introduce their various representations and describe connections to the theory of imprecise probabilities. We also study the algebraic structure of imprecision indices in the linear space and describe the extreme points of the convex set of all possible imprecision indices, in particular, of imprecision indices with symmetrical properties. We also show how to measure inconsistency in information by impression indices. At the end of the paper, we consider the application of imprecision indices in analysing the applicability of different aggregation rules in evidence theory.
To model conflict, non-specificity and contradiction in information, upper and lower generalized credal sets are introduced. Any upper generalized credal set is a convex subset of plausibility measures interpreted as lower probabilities whose bodies of evidence consist of singletons and a certain event. Analogously, contradiction is modelled in the theory of evidence by a belief function that is greater than zero at empty set. Based on generalized credal sets, we extend the conjunctive rule for contradictory sources of information, introduce constructions like natural extension in the theory of imprecise probabilities and show that the model of generalized credal sets coincides with the model of imprecise probabilities if the profile of a generalized credal set consists of probability measures. We give ways how the introduced model can be applied to decision problems.
Nowadays data-sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of “complex” sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of formal concept analysis and its extension based on “pattern structures”. Pattern structures are used for mining complex data (such as sequences or graphs) and are based on a subsumption operation, which in our case is defined with respect to the partial order on sequences. We show how pattern structures along with projections (i.e. a data reduction of sequential structures) are able to enumerate more meaningful patterns and increase the computing efficiency of the approach. Finally, we show the applicability of the presented method for discovering and analysing interesting patient patterns from a French healthcare data-set on cancer. The quantitative and qualitative results (with annotations and analysis from a physician) are reported in this use-case which is the main motivation for this work.
Order and lattice theory provides convenient mathematical tools for pattern mining, in particular for condensed irredundant representations of pattern spaces and their efficient generation. Formal Concept Analysis (FCA) offers a generic framework, called pattern structures, to formalize many types of patterns, such as itemsets, intervals, graphs, and sequence sets. Moreover, FCA provides generic algorithms to generate irredundantly all closed patterns, the only condition being that the pattern space is a meet-semilattice. This does not always hold, e.g. for sequential and graph patterns. Here, we discuss pattern setups consisting of descriptions making just a partial order. Such a framework can be too broad, causing several problems, so we propose a new model, dubbed pattern multistructures, lying between pattern setups and pattern structures, which relies on multilattices. Finally, we consider some techniques, namely completions, transforming pattern setups to pattern structures using sets/antichains of patterns.
We propose an iterative and human-centred knowledge discovery methodology based on formal concept analysis. The proposed approach recognizes the important role of the domain expert in mining real-world enterprise applications and makes use of specific domain knowledge, including human intelligence and domain-specific constraints. Our approach was empirically validated at the Amsterdam-Amstelland police to identify suspects and victims of human trafficking in 266,157 suspicious activity reports. Based on guidelines of the Attorney Generals of the Netherlands, we first defined multiple early warning indicators that were used to index the police reports. Using concept lattices, we revealed numerous unknown human trafficking and loverboy suspects. Indepth investigation by the police resulted in a confirmation of their involvement in illegal activities resulting in actual arrestments been made. Our human-centred approach was embedded into operational policing practice and is now successfully used on a daily basis to cope with the vastly growing amount of unstructured information.