A Hybrid Approach to Interpretable Analysis of Research Paper Collections

B. Mirkin; D. Frolov; A. Vlasov; Nascimento S.; Fenner T.

doi:10.1145/3405962.3405976

Publications

?

A Hybrid Approach to Interpretable Analysis of Research Paper Collections

P. 184–189.

Mirkin B., Frolov D., Vlasov A., Nascimento S., Fenner T.

We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a “head subject” in the higher ranks of the taxonomy, that is supposed to “tightly” cover the query set, possibly bringing in some errors, both “gaps” and “offshoots”. Our method involves two more automated analysis techniques: a fuzzy clustering method, FADDIS, involving both additive and spectral properties, and a purely structural string-to-text relevance measure based on suffix trees annotated by frequencies. We apply this to extract research tendencies from two collections of research papers: (a) about 18000 research papers published in Springer journals on data science for 20 years, and (b) about 27000 research papers retrieved from Springer and Elsevier journals in response to data science related queries. We consider a taxonomy of Data Science based on the Association for Computing Machinery Classification of Computing System (ACM-CCS 2012). Our findings allow us to make some comments on the tendencies of research that cannot be derived by using more conventional techniques.

Keywords: hybrid approach annotated suffix tree Generalization fuzzy cluster research tendency

Publication based on the results of:

Decision making and data analysis in socio-economic and political systems (2020)

In book

WIMS 2020: Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics

Association for Computing Machinery (ACM), 2020.

The Benefits of Query-Based KGQA Systems for Complex and Temporal Questions in LLM Era

Alekseev A., Chaichuk M., Butko M. et al., , in: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I. Natural Language Processing and Information Systems. (LNCS, volume 15836)* I. Vol. 15836.: Springer, 2025. P. 426–441.

Large language models excel in question-answering (QA) but struggle with multi-hop reasoning and temporal questions. Query-based knowledge graph QA (KGQA) offers a modular alternative by generating executable queries instead of direct answers. We explore multi-stage query-based framework for WikiData QA, proposing multi-stage approach that enhances performance on challenging multi-hop and temporal benchmarks. Through generalization and ...

Added: February 3, 2026

Modeling Generalization in Domain Taxonomies Using a Maximum Likelihood Criterion

Zhirayr Hayrapetyan, Nascimento S., Trevor F. et al., , in: Information Systems and Technologies: WorldCIST 2022, Volume 2Issue 469.: Springer, 2022. P. 141–147.

We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its “head subject” node in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly involving some ...

Added: November 18, 2022

A hybrid lemmatiser for Old Church Slavonic

Afanasev I., / NRU HSE. Series WP BRP "Linguistics". 2021.

The article considers a lemmatiser that is developed specifically for Old Church Slavonic (OCS). The introduction underlines the problem of the lack of lemmatisers that might deal with different datasets of the OCS. The review gives a short description of previous attempts and current trends in lemmatisation. The lemmatiser is hybrid-based and uses the advantages ...

Added: December 28, 2021

A Hybrid Approach to the Analysis of a Collection of Research Papers

Mirkin B., Frolov D., Vlasov A. et al., , in: Intelligent Data Engineering and Automated Learning – IDEAL 2020/ 21st International Conference, Guimaraes, Portugal, November 4–6, 2020, Proceedings, Part IIVol. 12490: Lecture Notes in Computer Science.: Cham: Springer, 2020. P. 423–433.

Added: November 13, 2020

Computational Generalization in Taxonomies Applied to: (1) Analyze Tendencies of Research and (2) Extend User Audiences

Frolov D., Mirkin B., Nascimento S. et al., , in: Intelligent Data Engineering and Automated Learning – IDEAL 2019Vol. 2.: Springer, 2019. P. 3–11.

Added: December 7, 2019

Intelligent Data Engineering and Automated Learning – IDEAL 2019

Springer, 2019.

Added: December 7, 2019

Using Taxonomy Tree to Generalize a Fuzzy Thematic Cluster

Frolov D., Mirkin B., Nascimento S. et al., , in: Fuzzy Systems (FUZZ-IEEE), IEEE International Conference Proceedings.: IEEE, 2019. P. 1–6.

This paper presents an algorithm, ParGenFS, for generalizing, or “lifting”, a fuzzy set of topics to higher ranks of a hierarchical taxonomy of a research domain. The algorithm ParGenFS finds a globally optimal generalization of the topic set to minimize a penalty function, by balancing the number of introduced “head subjects” and related errors, the ...

Added: October 30, 2019

Parsimonious Generalization of Fuzzy Thematic Sets in Taxonomies Applied to the Analysis of Tendencies of Research in Data Science

Frolov D., Nascimento S., Fenner T. et al., Information Sciences 2020 Vol. 512 P. 595–615

This paper proposes a novel method, referred to as ParGenFS, for finding a most specific generalization of a query set represented by a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. The query set is generalized by “lifting” it to one or more “head subjects” in the higher ranks ...

Added: October 9, 2019

A Method for Audience Extending in Programmatic Advertising by Using Parsimonious Generalization of User Segments

Frolov D., Taran Z., Mirkin B., , in: International Conference on Human Interaction and Emerging Technologies.: Springer, 2020. P. 837–841.

We propose a novel method for efficient target audience augmentation in programmatic digital advertising. This method utilizes a novel ParGenFS algorithm for most adequate generalization in taxonomies which was developed by the authors in a joint work. The ParGenFS extends user segments by parsimoniously lifting them off-line as a fuzzy set over IAB content taxonomy ...

Added: July 31, 2019

Globally Optimal Parsimoniously Lifting a Fuzzy Query Set Over a Taxonomy Tree

Frolov D., Mirkin B., Nascimento S. et al., , in: Optimization of Complex Systems: Theory, Models, Algorithms and Applications.: Switzerland: Springer Publishing Company, 2020. P. 779–789.

This paper presents a relatively rare case of an optimization problem in data analysis to admit a globally optimal solution by a recursive algorithm. We are concerned with finding a most specific generalization of a fuzzy set of topics assigned to leaves of domain taxonomy represented by a rooted tree. The idea is to “lift” ...

Added: June 25, 2019

Using Domain Taxonomy to Model Generalization of Thematic Fuzzy Clusters

Frolov D., Mirkin B., Nascimento S. et al., , in: CONTENT 2019, The Eleventh International Conference on Creative Content Technologies.: International Academy, Research, and Industry Association (IARIA), 2019. P. 20–25.

We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its 'head subject' in the higher ranks of the taxonomy tree. The head subject is supposed to 'tightly' cover the query set, possibly bringing in some ...

Added: June 4, 2019

CONTENT 2019, The Eleventh International Conference on Creative Content Technologies

International Academy, Research, and Industry Association (IARIA), 2019.

Added: June 4, 2019

Method for Generalization of Fuzzy Sets

Frolov D., Mirkin B., Nascimento S. et al., , in: International Conference on Artificial Intelligence and Soft Computing. 18th International Conference, ICAISC 2019, Zakopane, Poland, June 16–20, 2019, Proceedings* 1. Issue 11508.: Cham: Springer, 2019. P. 273–286.

Added: June 3, 2019

International Conference on Artificial Intelligence and Soft Computing. 18th International Conference, ICAISC 2019, Zakopane, Poland, June 16–20, 2019, Proceedings

Cham: Springer, 2019.

The series Lecture Notes in Computer Science (LNCS), including its subseries Lecture Notes in Artificial Intelligence (LNAI) and Lecture Notes in Bioinformatics (LNBI), has established itself as a medium for the publication of new developments in computer science and information technology research and teaching - quickly, informally, and at a high level. The two-volume set LNCS ...

Added: June 3, 2019

Annotated Suffix Tree Method for German Compound Splitting

Shishkova A., Artemova E., , in: CLLS 2016. Computational Linguistics and Language Science. Proceedings of the Workshop on Computational Linguistics and Language Science. Moscow, Russia, April 26, 2016Vol. 1886.: Aachen: CEUR Workshop Proceedings, 2017. P. 42–47.

The paper presents an unsupervised and knowledge-free ap- proach to compound splitting. Although the research is focused on Ger- man compounds, the method is expected to be extensible to other com- pounding languages. The approach is based on the annotated suffix tree (AST) method proposed and modified by Mirkin et al. To the best of ...

Added: October 10, 2017