Method for Generalization of Fuzzy Sets
We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a “head subject” in the higher ranks of the taxonomy, that is supposed to “tightly” cover the query set, possibly bringing in some errors, both “gaps” and “offshoots”. The method globally minimizes a penalty combining head subjects and gaps and offshoots. We apply this to extract research tendencies from a collection of about 18000 research papers published in Springer journals on data science. We consider a taxonomy of Data Science based on the Association for Computing Machinery Classification of Computing System 2012 (ACM-CCS). We find fuzzy clusters of leaf topics over the text collection and use thematic clusters’ head subjects to make some comments on the tendencies of research.