Modeling Generalization in Domain Taxonomies Using a Maximum Likelihood Criterion
We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its “head subject” node in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly involving some errors referred to as “gaps” and “offshoots”. We develop a method to globally maximize the likelihood of a scenario involving gains and losses of the general concept manifested in a fuzzy cluster of leaf nodes of the taxonomy. Probabilities of the gain and loss events are derived from multiple runs of our earlier method of maximum parsimony starting with randomly generated values for the two parameters involved. Supplemented with fuzzy c-means clustering, this allows us to obtain meaningful generalizations for six fuzzy thematic clusters of Data Science topics using over 17000 abstracts from 17 research journals published by Springer.