?
Using Domain Taxonomy to Model Generalization of Thematic Fuzzy Clusters
We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its 'head subject' in the higher ranks of the taxonomy tree. The head subject is supposed to 'tightly' cover the query set, possibly bringing in some errors, both 'gaps' and 'offshoots'. Our method globally minimizes a penalty function combining the numbers of head subjects and gaps and offshoots, differently weighted. We apply this to a collection of about 18000 research papers published in Springer journals on Data Science for the past 20 years. We extract a taxonomy of Data Science from the international Association for Computing Machinery Computing Classification System 2012 (ACM-CCS). We find fuzzy clusters of leaf topics over the text collection and use lifted head subjects of the thematic clusters to comment on the tendencies of current research in the corresponding aspects of the domain.