• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Using Domain Taxonomy to Model Generalization of Thematic Fuzzy Clusters
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Using Domain Taxonomy to Model Generalization of Thematic Fuzzy Clusters

P. 20–25.
Frolov D., Mirkin B., Nascimento S., Fenner T.

We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its 'head subject' in the higher ranks of the taxonomy tree. The head subject is supposed to 'tightly' cover the query set, possibly bringing in some errors, both 'gaps' and 'offshoots'. Our method globally minimizes a penalty function combining the numbers of head subjects and gaps and offshoots, differently weighted. We apply this to a collection of about 18000 research papers published in Springer journals on Data Science for the past 20 years. We extract a taxonomy of Data Science from the international Association for Computing Machinery Computing Classification System 2012 (ACM-CCS). We find fuzzy clusters of leaf topics over the text collection and use lifted head subjects of the thematic clusters to comment on the tendencies of current research in the corresponding aspects of the domain.

Language: English
Full text
Text on another site
Keywords: spectral clusteringannotated suffix treesFuzzy clustering Generalizationgap-offshoot penalty
Publication based on the results of:
Разработка методов структуризации и концептуализации текстовых данных на основе таксономии предметной области (2019)

In book

CONTENT 2019, The Eleventh International Conference on Creative Content Technologies
International Academy, Research, and Industry Association (IARIA), 2019.
Similar publications
The Benefits of Query-Based KGQA Systems for Complex and Temporal Questions in LLM Era
Alekseev A., Chaichuk M., Butko M. et al., , in: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I. Natural Language Processing and Information Systems. (LNCS, volume 15836)* I. Vol. 15836.: Springer, 2025. P. 426–441.
Large language models excel in question-answering (QA) but struggle with multi-hop reasoning and temporal questions. Query-based knowledge graph QA (KGQA) offers a modular alternative by generating executable queries instead of direct answers. We explore multi-stage query-based framework for WikiData QA, proposing multi-stage approach that enhances performance on challenging multi-hop and temporal benchmarks. Through generalization and ...
Added: February 3, 2026
Directions for modifying the artificial bee colony algorithm to optimize control parameters for complex systems
Bulygina O. V., Kulyasov N.S., Yartsev D. D., Прикладная информатика 2024 Vol. 19 No. 1 P. 28–37
In recent years, bioinspired algorithms based on the use of a population approach and a probabilistic search strategy have become especially popular among researchers involved in multidimensional and multicriteria optimization. Such algorithms are based on the principles of cooperative behavior of a decentralized self-organizing colony of living organisms (bees, ants, birds, etc.) to achieve certain ...
Added: September 26, 2024
Имитационная модель интеллектуальной транспортной системы «умного города» с адаптивным управлением светофорами на основе нечеткой кластеризации
Beklaryan A., Бекларян Л. А., Akopov A. S., Бизнес-информатика 2023 Т. 17 № 3 С. 70–86
This article presents a new simulation model of an intelligent transportation system (ITS) for the “smart city” with adaptive traffic light control. The proposed transportation model, implemented in the AnyLogic, allows us to study the behavior of interacting agents: vehicles (V) and pedestrians (P) within the framework of a multi-agent ITS of the “Manhattan Lattice” ...
Added: May 25, 2024
Modeling Generalization in Domain Taxonomies Using a Maximum Likelihood Criterion
Zhirayr Hayrapetyan, Nascimento S., Trevor F. et al., , in: Information Systems and Technologies: WorldCIST 2022, Volume 2Issue 469.: Springer, 2022. P. 141–147.
We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its “head subject” node in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly involving some ...
Added: November 18, 2022
A Hybrid Approach to the Analysis of a Collection of Research Papers
Mirkin B., Frolov D., Vlasov A. et al., , in: Intelligent Data Engineering and Automated Learning – IDEAL 2020/ 21st International Conference, Guimaraes, Portugal, November 4–6, 2020, Proceedings, Part IIVol. 12490: Lecture Notes in Computer Science.: Cham: Springer, 2020. P. 423–433.
We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a “head subject” in the higher ranks of the taxonomy, that is supposed to “tightly” cover the query set, possibly bringing in some errors, both ...
Added: November 13, 2020
Система управления беспилотными транспортными средствами на основе нечеткой кластеризации. Часть 2. Нечеткая кластеризация и программная реализация
Akopov A. S., Khachatryan N., Бекларян Л. А. et al., Вестник компьютерных и информационных технологий 2020 Т. 17 № 10 С. 21–29
This article continues the description of the control system for ground unmanned vehicles as part of the integration of a phenomenological approach to modeling the behavior of agents and methods of fuzzy clustering in order to improve the quality of decisions. As a result, adaptive fuzzy clustering methods provide support for adaptive ground unmanned vehicles ...
Added: September 11, 2020
A Hybrid Approach to Interpretable Analysis of Research Paper Collections
Mirkin B., Frolov D., Vlasov A. et al., , in: WIMS 2020: Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics.: Association for Computing Machinery (ACM), 2020. P. 184–189.
We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a “head subject” in the higher ranks of the taxonomy, that is supposed to “tightly” cover the query set, possibly bringing in some errors, both ...
Added: August 28, 2020
Система управления беспилотными транспортными средствами на основе нечеткой кластеризации. Часть 1. Модель движения транспортных средств
Akopov A. S., Khachatryan N., Бекларян Л. А. et al., Вестник компьютерных и информационных технологий 2020 Т. 17 № 9 С. 3–12
A control system for ground unmanned vehicles is presented, using fuzzy clustering methods for making decisions at an individual level. A new approach to the management of ground unmanned vehicles has been developed, taking into account the state of vehicles in a dense traffic, in particular, the presence of road accidents, the appearance of traffic congestion ...
Added: August 26, 2020
Cluster-Based Optimization of an Evacuation Process Using a Parallel Bi-Objective Real-Coded Genetic Algorithm
Akopov A. S., Beklaryan L., Beklaryan A. L., Cybernetics and Information Technologies 2020 Vol. 20 No. 3 P. 45–63
This work presents a novel approach to the design of a decision-making system for the cluster-based optimization of an evacuation process using a Parallel bi-objective Real-Coded Genetic Algorithm (P-RCGA). The algorithm is based on the dynamic interaction of distributed processes with individual characteristics that exchange the best potential decisions among themselves through a global population. Such an approach allows the ...
Added: August 19, 2020
Нечеткая кластеризация в задаче управления беспилотными транспортными средствами
Beklaryan A., В кн.: Труды X-й Международной школы-семинара «Многомерный статистический анализ, эконометрика и моделирование реальных процессов» имени С.А. АйвазянаЧ. 1.: М.: ЦЭМИ РАН, 2020. С. 27–28.
This work is devoted to the development of an evolutionary algorithm for fuzzy clustering of an ensemble of interacting conventional and unmanned vehicles in order to identify the relationship between stable groups of agents and initial modeling parameters. ...
Added: June 24, 2020
Intelligent Data Engineering and Automated Learning – IDEAL 2019
Springer, 2019.
We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its “head subject” node in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly bringing in some errors referred to ...
Added: December 7, 2019
Using Taxonomy Tree to Generalize a Fuzzy Thematic Cluster
Frolov D., Mirkin B., Nascimento S. et al., , in: Fuzzy Systems (FUZZ-IEEE), IEEE International Conference Proceedings.: IEEE, 2019. P. 1–6.
This paper presents an algorithm, ParGenFS, for generalizing, or “lifting”, a fuzzy set of topics to higher ranks of a hierarchical taxonomy of a research domain. The algorithm ParGenFS finds a globally optimal generalization of the topic set to minimize a penalty function, by balancing the number of introduced “head subjects” and related errors, the ...
Added: October 30, 2019
Parsimonious Generalization of Fuzzy Thematic Sets in Taxonomies Applied to the Analysis of Tendencies of Research in Data Science
Frolov D., Nascimento S., Fenner T. et al., Information Sciences 2020 Vol. 512 P. 595–615
This paper proposes a novel method, referred to as ParGenFS, for finding a most specific generalization of a query set represented by a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. The query set is generalized by “lifting” it to one or more “head subjects” in the higher ranks ...
Added: October 9, 2019
A Method for Audience Extending in Programmatic Advertising by Using Parsimonious Generalization of User Segments
Frolov D., Taran Z., Mirkin B., , in: International Conference on Human Interaction and Emerging Technologies.: Springer, 2020. P. 837–841.
We propose a novel method for efficient target audience augmentation in programmatic digital advertising. This method utilizes a novel ParGenFS algorithm for most adequate generalization in taxonomies which was developed by the authors in a joint work. The ParGenFS extends user segments by parsimoniously lifting them off-line as a fuzzy set over IAB content taxonomy ...
Added: July 31, 2019
Globally Optimal Parsimoniously Lifting a Fuzzy Query Set Over a Taxonomy Tree
Frolov D., Mirkin B., Nascimento S. et al., , in: Optimization of Complex Systems: Theory, Models, Algorithms and Applications.: Switzerland: Springer Publishing Company, 2020. P. 779–789.
This paper presents a relatively rare case of an optimization problem in data analysis to admit a globally optimal solution by a recursive algorithm. We are concerned with finding a most specific generalization of a fuzzy set of topics assigned to leaves of domain taxonomy represented by a rooted tree. The idea is to “lift” ...
Added: June 25, 2019
CONTENT 2019, The Eleventh International Conference on Creative Content Technologies
International Academy, Research, and Industry Association (IARIA), 2019.
Added: June 4, 2019
Method for Generalization of Fuzzy Sets
Frolov D., Mirkin B., Nascimento S. et al., , in: International Conference on Artificial Intelligence and Soft Computing. 18th International Conference, ICAISC 2019, Zakopane, Poland, June 16–20, 2019, Proceedings* 1. Issue 11508.: Cham: Springer, 2019. P. 273–286.
We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a “head subject” in the higher ranks of the taxonomy, that is supposed to “tightly” cover the query set, possibly bringing in some errors, both ...
Added: June 3, 2019
International Conference on Artificial Intelligence and Soft Computing. 18th International Conference, ICAISC 2019, Zakopane, Poland, June 16–20, 2019, Proceedings
Cham: Springer, 2019.
The series Lecture Notes in Computer Science (LNCS), including its subseries Lecture Notes in Artificial Intelligence (LNAI) and Lecture Notes in Bioinformatics (LNBI), has established itself as a medium for the publication of new developments in computer science and information technology research and teaching - quickly, informally, and at a high level. The two-volume set LNCS ...
Added: June 3, 2019
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit