Finding an appropriate generalization for a fuzzy thematic set in taxonomy

Working Paper

Finding an appropriate generalization for a fuzzy thematic set in taxonomy

2018. No. 4.

Mirkin B., Frolov D., Fenner T., Nascimento S.

This paper proposes a novel method, referred to as ParGenFS, for finding a most specific generalization of a query set, represented by a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the query set to one or several head subjects in the higher ranks of the taxonomy. The head subject is supposed to tightly cover the query set, however dispersed that can be over branches of the tree, possibly bringing in some gaps, that are taxonomy nodes covered by the head subject but irrelevant to the set. To balance that, we admit some offshoots, that are nodes belonging to the query set but not covered by the head subject. The method globally minimizes the total number of head subjects and gaps and offshoots, differently weighted. Our algorithm is applied to the structural analysis and description of a collection of 17685 abstracts of research papers published in 17 Springer journals on data science for the 20-years period 1998–2017. Our taxonomy of Data Science (DST) is extracted from the international Association for Computing Machinery Computing Classification System 2012 (ACM-CCS), a six-layer hierarchical taxonomy manually developed by a team of ACM experts. The DST also involves a number of additions detailing the leaves of the ACM-CCS taxonomy and added by ourselves. We find fuzzy clusters of leaf topics over the text collection, with a specially developed machinery. Three of the clusters are thematic indeed, relating to Data Science sub-areas: (a) learning, (b) information retrieval, and (c) clustering. These three clusters are lifted with ParGenFS in the DST, which allows us to make some conclusions of the tendencies of the developments in these areas.

Research target: Computer Science

Priority areas: IT and mathematics

Language: English

Full text

Publication based on the results of:

Modern context of decision making and data analysis methods: human factor, uncertainty, risks, network models, big data (2018)

Probably approximately correct learning of Horn envelopes from queries

Borchmann D., Hanika T., Obiedkov S., Discrete Applied Mathematics 2020 Vol. 273 P. 30–42.

We propose an algorithm for learning the Horn envelope of an arbitrary domain using an expert, or an oracle, capable of answering certain types of queries about this domain. Attribute exploration from formal concept analysis is a procedure that solves this problem, but the number of queries it may ask is exponential in the size ...

Added: October 29, 2019

Proceedings of 11th Industrial Conference on Data Mining (ICDM 2012)

Springer, 2012..

Added: January 29, 2013

On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era

Shuranov E., / Series Computer Science "arxiv.org". 2021..

Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the SER systems. Further, more clarification is required for analysing the impact of ASR's word error rate (WER) on linguistic emotion ...

Added: February 14, 2023

Sheath parameters for non-Debye plasmas: Simulations and arc damage

Morozov I., Norman G. E., Insepov Z. et al., Physical Review Special Topics - Accelerators and Beams 2012 Vol. 15 P. 053501.

This paper describes the surface environment of the dense plasma arcs that damage rf accelerators, tokamaks, and other high gradient structures. We simulate the dense, nonideal plasma sheath near a metallic surface using molecular dynamics (MD) to evaluate sheaths in the non-Debye region for high density, low temperature plasmas. We use direct two-component MD simulations ...

Added: October 28, 2013

Hardness of Approximation for H-free Edge Modification Problems

Bliznets Ivan, Cygan M., Komosa P. et al., ACM Transactions on Computation Theory 2018 Vol. 10 No. 2 P. 1–32.

The H-free Edge Deletion problem asks, for a given graph G and integer k, whether it is possible to delete at most k edges from G to make it H-free—that is, not containing H as an induced subgraph. The H-free Edge Completion problem is defined similarly, but we add edges instead of deleting them. The study of these two problem families has recently been the subject of intensive studies from the point of ...

Added: October 30, 2018

Priority Queueing for Packets with Two Characteristics

Chuprikov P., Nikolenko S. I., Davydow A. et al., IEEE Transactions on Networking 2018 Vol. 26 No. 1 P. 342–355.

Modern network elements are increasingly required to deal with heterogeneous traffic. Recent works consider processing policies for buffers that hold packets with different processing requirements (number of processing cycles needed before a packet can be transmitted out) but uniform value, aiming to maximize the throughput, i.e., the number of transmitted packets. Other developments deal with ...

Added: March 14, 2018

Сборник трудов конференции NI Academic Days 2017, Москва 13-14 апреля 2017 г.

М.: National Instruments Russia, 2017..

Содержание сборника составляют доклады с результатами оригинальных исследований и технических решений, ранее не публиковавшиеся. Мы надеемся, что предлагаемый сборник окажется полезным для специалистов, работающих в различных областях науки и техники, для широкого круга преподавателей, аспирантов и студентов ВУЗов, а также для преподавателей средних школ и технических колледжей. ...

Added: May 10, 2017

The complexity of the 3-colorability problem in the absence of a pair of small forbidden induced subgraphs

Malyshev D., Discrete Mathematics 2015 Vol. 338 No. 11 P. 1860–1865.

We completely determine the complexity status of the 3-colorability problem for hereditary graph classes defined by two forbidden induced subgraphs with at most five vertices. ...

Added: April 7, 2014

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 29 мая — 1 июня 2019 г.). Вып. 18 (25)

М.: Издательский центр «Российский государственный гуманитарный университет», 2019..

Сборник включает 27 докладов международной конференции по компьютерной лингвистике и интеллектуальным технологиям «Диалог 2019», не вошедшие в ежегодник «Компьютерная лингвистика и интеллектуальные технологии», но рекомендованные Программным Комитетом к представлению на конференции. Для специалистов в области теоретической и прикладной лингвистики и интеллектуальных технологий. ...

Added: December 10, 2019

Formation of Control Structures in Static Swarms

Karpov V. E., Karpova I. P., Procedia Engineering 2015 Vol. 100 P. 1459–1468.

Work solutions are proposed for problems of leader definition and role distribution in homogeneous groups of robots. It is shown that transition from a swarm to a collective of robots with hierarchical organization is possible using exclusively local interaction. The local revoting algorithm is central to the procedure for choice of leader while redistribution of roles can ...

Added: March 14, 2015

Improving quality of graph partitioning using multi-level optimization

S. D. Kuznetsov, Turdakov D. Y., Пастухов Р. К. et al., Programming and Computer Software 2015 Vol. 41 No. 5 P. 302–306.

Graph partitioning is required for solving tasks on graphs that need to be distributed over disks or computers. This problem is well studied, but the majority of the results on this subject are not suitable for processing graphs with billions of nodes on commodity clusters, since they require shared memory or lowlatency messaging. One of ...

Added: January 23, 2018

О выборе программных средств когнитивной компьютерной визуализации

Baibikova T., Domoratsky E., Вестник Московского финансово-юридического университета 2017 № 1 С. 200–206.

Some questions of scientific visualization are under consideration in this paper. This article also discusses the peculiarities of application of cognitive computer graphics, singles out a range of tasks of scientific visualization. The paper gives a brief overview of modern support tools for program visualization, tendencies of their development and their main characteristics. A module ...

Added: June 10, 2017

Parsimonious Generalization of Fuzzy Thematic Sets in Taxonomies Applied to the Analysis of Tendencies of Research in Data Science

Frolov D., Nascimento S., Fenner T. et al., Information Sciences 2020 Vol. 512 P. 595–615.

This paper proposes a novel method, referred to as ParGenFS, for finding a most specific generalization of a query set represented by a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. The query set is generalized by “lifting” it to one or more “head subjects” in the higher ranks ...

Added: October 9, 2019

A Hybrid Cluster-Lift Method for the Analysis of Research Activities

Mirkin B., Fenner T., Nascimento S. et al., Lecture Notes in Computer Science 2010 Vol. 6076 No. 1 P. 152–161.

A hybrid of two novel methods - additive fuzzy spectral clustering and lifting method over a taxonomy - is applied to analyse the research activities of a department. To be specific, we concentrate on the Computer Sciences area represented by the ACM Computing Classification System (ACM-CCS), but the approach is applicable also to other taxonomies. ...

Added: November 14, 2012

Автоматизация подсчета количества частиц на наномасштабных изображениях электронного микроскопа

Байдин Г. С., Титов А. С., Биомедицинская радиоэлектроника 2020 Т. 23 № 5 С. 59–71.

Постановка проблемы. С ростом сложности исследования химических соединений в различных средах и усложнением обработки результатов экспериментов возникает необходимость в автоматизации данного процесса для улучшения точ-ности и достоверности полученных результатов. Цель работы – разработка методики для автоматизированного подсчета количества частиц в веществе на изображениях электронного микроскопа. Результаты. Разработана методика автоматизированного подсчета количества частиц в веществе на изображениях электронного ...

Added: September 5, 2021

Микроэлектроника и информатика – 2013. Тезисы докладов

Зеленоград: МИЭТ, 2013..

В сборнике тезисов докладов 20-й Всероссийской межвузовской научно-технической конференции "Микроэлектроника и информатика 2013", которая проводится в год 55-летия образования г. Зеленограда, признанного в стране и мире центра микроэлектроники и нанотехнологий, представлены результаты научных исследований студентов, аспирантов и молодых ученых зеленоградских предприятий и вузов России по следующим приоритетным направлениям развития науки и техники микро- и наноэлектроника, ...

Added: May 31, 2013

Использование веб-камер в качестве источника стереопар

Protasov S., Кургалин С. Д., Крыловецкий А. А., Вестник Воронежского государственного университета. Серия: Системный анализ и информационные технологии 2011 № 2 С. 80–86.

Задачу формирования стерео-видеопотока в настоящий момент необходимо решать в большом спектре практических приложений. Кроме кино-индустрии, получение и обработка стереоизображений в реальном времени находит применение в промышленности, коммуникации, моделировании и т.д. В данной статье рассматривается подход к созданию гибкой системы захвата стерео-видеопотока на базе web-камер, которая может быть интегрирована в компактные персональные устройства. Текст статьи Аннотация на сайте издания ...

Added: February 11, 2013

Моделирование сетей на кристалле на основе регулярных и квазиоптимальных топологий с помощью симулятора OCNS

Romanov A., Tumkovskiy S., Иванова Г. А., Вестник РГРТУ 2015 Т. 2 № 52 С. 61–66.

A review of the networks-on-chip modeling methods is given. A high-level model of networks-on-chip based on the programming language Java, which helps to accelerate the modeling process by several orders, compared to HDL‑models is developed. The results of simulation of networks-on-chip based on regular and quasi-optimal topologies with the number of nodes up to 100 ...

Added: June 21, 2015

Программирование в операционной среде UNIX: обмен информацией между параллельными процессами, организация защиты файлов в файловой системе, обработка прерываний

Istratov A., М.: РГУИТП, 2006..

Рассматриваются аспекты системного программирования в среде UNIX-подобных операционных систем ...

Added: February 8, 2013

Particle Simulation for Predicting Effective Properties of Short Fiber Reinforced Composites

Skoptsov K. A., Sheshenin S., Galatenko V. V. et al., International Journal of Applied Mechanics 2016 Vol. 8 No. 2 P. 1650016-01–1650016-18.

We present a method for evaluating elastic properties of a composite material produced by molding a resin filled with short elastic fibers. A flow of the filled resin is simulated numerically using a mesh-free method. After that, assuming that spatial distribution and orientation of fibers are not significantly changed during polymerization, effective elastic moduli of ...

Added: May 21, 2016

Компьютерный синтез и моделирование наноструктур бистабильных ячеек для матриц памяти с повышенной информационной плотностью.

Trubochkina N. K., Качество. Инновации. Образование 2014 № 9 С. 43–53.

Approach to creating a memory array constructed on two different algorithms to provide basic memory - R-trigger in transition circuitry is described. The results of a successful computer simulations for two one-layer nanostructures for memory arrays with high information density are given. The fundamental importance is the implementation of a single-layer nanostructures storage elements, which ...

Added: March 2, 2015

CEUR Workshop Proceedings. Proceedings of the International Workshop on Social Network Analysis using Formal Concept Analysis (SNAFCA 2015)

Malaga: CEUR Workshop Proceedings, 2015..

Social network analysis (SNA) is a multidisciplinary research area that has attracted many researchers from different disciplines such as Physics, Mathematics, Sociology, Biology and Computer Science, and has been studied according to different approaches and techniques. A social network is a dynamic structure (generally represented as a graph) of a set of entities/actors (nodes) together ...

Added: October 19, 2015

Agent-based modelling of interactions between air pollutants and greenery using a case study of Yerevan, Armenia

Akopov A. S., Beklaryan L. A., Saghatelyan A. K., Environmental Modelling and Software 2019 Vol. 116 P. 7–25.

Urban greenery such as trees can effectively reduce air pollution in a natural and eco-friendly way. However, how to spatially locate and arrange greenery in an optimal way remains as a challenging task. We developed an agent-based model of air pollution dynamics to support the optimal allocation and configuration of tree clusters in a city. The Pareto ...

Added: February 24, 2019

Intelligent Network Security Monitoring Based on Optimum-Path Forest Clustering

Guimarães R. R., Passos L., Filho R. H. et al., IEEE Network 2019 Vol. 33 No. 2 P. 126–131.

Distinguishing outliers from normal data in wireless sensor networks has been a big challenge in the anomaly detection domain, mostly due to the nature of the anomalies, such as software or hardware failures, reading errors or malicious attacks, just to name a few. In this article, we introduce an anomaly detection-based OPF classifier in the ...

Added: December 19, 2018