Global Optimization in Learning with Important Data: an FCA-Based Approach

Y. Kashnitsky; S. Kuznetsov

?

Global Optimization in Learning with Important Data: an FCA-Based Approach

Ch. 19. P. 189–202.

Nowadays decision tree learning is one of the most popular classification and regression techniques. Though decision trees are not accurate on their own, they make very good base learners for advanced tree-based methods such as random forests and gradient boosted trees. However, applying ensembles of trees deteriorates interpretability of the final model. Another problem is that decision tree learning can be seen as a greedy search for a good classification hypothesis in terms of some information-based criterion such as Gini impurity or information gain. But in case of small data sets the global search might be possible. In this paper, we propose an FCA-based lazy classification technique where each test instance is classified with a set of the best (in terms of some information-based criterion) rules. In a set of benchmarking experiments, the proposed strategy is compared with decision tree and nearest neighbor learning.

Language: English

Full text

Keywords: Formal Concept Analysis machine learning classification global optimization

Publication based on the results of:

Mining Data with Complex Structure and Semantic Technologies (2016)

In book

CLA 2016: Proceedings of the Thirteenth International Conference on Concept Lattices and Their Applications. CEUR Workshop Proceedings

Vol. 1624. , M.: Higher School of Economics, National Research University, 2016.

ТЕРРИТОРИАЛЬНАЯ ВАРИАТИВНОСТЬ ОКСИТАНСКОГО ЯЗЫКА: КЛАССИФИКАЦИЯ СЕВЕРНЫХ ДИАЛЕКТОВ

Бестолкова Г. В., Теория языка и межкультурная коммуникация 2023 № №3(50) С. 1–15

Significant role in modern Occitan language’s development is played by variety of dialects, subdialects and colloquial speech, that determines relevance of the study undertaken in this article. Occitan language dialects’ number is large, therefore only its northern dialects are considered in detail within this article. The material contained in the article allows to form a ...

Added: February 15, 2026

Real-Bogus Classification for ZTF Data Releases: Two Approaches

Semenikhin n., Kornilov M., Lavrukhina A. et al., Communications in Computer and Information Science 2026 Vol. 2641 P. 211–219

We considered two fundamentally different approaches to real-bogus classification within the Zwicky Transient Facility survey data. The first approach is based on neural networks that take sequences of object images as input. The second approach uses features extracted from light curves and classical machine learning methods. Several models for both approaches were tested. Quality metrics ...

Added: February 12, 2026

Is Canfield Right? On the Asymptotic Coefficients for the Maximum Antichain of Partitions and Related Counting Inequalities

Ignatov D. I., , in: 11th International Conference, AIST 2023, Yerevan, Armenia, September 28–30, 2023, Revised Selected Papers. Analysis of Images, Social Networks and Texts. Lecture Notes in Computer Science (LNCS, volume 14486).: Cham: Springer, 2024. P. 349 – 361.

This paper dates back to the asymptotic solutions of Rota’s problem on the size of maximum antichain in the set partition lattice by Canfield and Harper and others. The knowledge of asymptotic coefficients could pave the way to the asymptotic solutions of such problems as (maximal) antichain counting in partition lattices. In addition to our ...

Added: January 23, 2026

Classification Approach to Mapping Cultural Differences: An Illustration Using Survey Data from 60 Russian Regions

Nastina E., Sokolov B., / Series OSF "SocArXiv". 2025.

We argue that a classification-based approach to measuring cultural differences across countries or subnational regions is a promising complement, and sometimes an alternative, to the widely used dimensional method in cross-cultural research. The latter summarises cultural variation using continuous dimensions, for example, Hofstede’s famous individualism-collectivism dimension. However, this approach relies on strong parametric assumptions, which are ...

Added: December 23, 2025

Method of Automated Dataset Collection for Microwave Filters Synthesis

Arinin O. V., Bakhmach D. M., Katsnelson A. et al., , in: 2025 Systems of Signals Generating and Processing in the Field of on Board Communications.: IEEE, 2025. P. 1–5.

This research discusses the method of dataset collection automatization for microwave filter synthesis by integrating machine learning techniques, thus reducing development time. Utilizing the 3D electromagnetic analysis software package, the study involves simulation and collecting geometric parameters and amplitude-frequency characteristics from three variants of passband highly selective microstrip tworesonator combined filters with stepped impedance resonators. ...

Added: December 6, 2025

ОТСЛЕЖИВАНИЕ РАЗВИТИЯ РАЗРУШЕНИЯ С ПОМОЩЬЮ КЛАСТЕРИЗАЦИИ ИМПУЛЬСОВ ТЕРМИЧЕСКИ СТИМУЛИРОВАННОЙ АКУСТИЧЕСКОЙ ЭМИССИИ ПРИ ОТСУТСТВИИ ЛОКАЦИИ

Индаков Г. С., Казначеев П. А., Майбук З. Я. et al., Геофизические исследования 2025 Т. 26 № 2 С. 99–124

The paper studies the clusterability of acoustic emission pulses during high-temperature heating of sandstone sample preliminarily subjected to mechanical loading. Mechanical loading was applied in uniaxial mode up to load close to destructive with appearance of signs of large cracks on the surface. After that, samples were subjected to thermal treatment up to 650 °C ...

Added: September 19, 2025

Rewriting the Rules: LLMs Vs. Traditional ML in University Admissions

Chepikov I., Karpov I., , in: 26th International Conference, AIED 2025, Palermo, Italy, July 22–26, 2025, Proceedings, Part I. Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED.: Springer, 2025. P. 352 – 358.

Modern LLM models such as BERT, ChatGPT, DeepSeek have shown great potential in solving various tasks, including text classification, text generation, analysis and summary of documents. In this paper, we show that these models close to classical ML approaches based on decision trees not only in text processing, but also in processing classical tabular data ...

Added: September 4, 2025

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics

Wien: Association for Computational Linguistics, 2025.

Added: August 26, 2025

Analysis of a Company Model in Conditions of Unstable Demand Using Reinforcement Learning Methods

Delev A., Semakov S., , in: 2025 8th International Conference on Artificial Intelligence and Big Data (ICAIBD).: IEEE, 2025. P. 318–322.

Profit is one of the most important economic indicators of a company’s performance, and for every company it is necessary to allocate resources in such a way as to obtain the maximum possible profit. The profit maximization problem is usually a dynamic optimization problem. This article discusses an approach to solving the production expansion problem ...

Added: August 25, 2025

Абстрактные логики как структуры и классификации структур

Dragalina-Chernaya E., В кн.: Четырнадцатые Смирновские чтения по логике: материалы Междунар. науч. конф., Москва, 19-21 июня 2025 г.: М.: Издатель Александр Воробьев, 2025. С. 80–82.

В докладе сопоставляются истолкования абстрактных логик как структур и как классификаций абстрактных структур. ...

Added: June 20, 2025

Экономические и социальные аспекты атомной энергетики в условиях развития технологий искусственного интеллекта

Podchufarov A., Galkina A. N., Ванина С. С. et al., Экономика и управление: проблемы, решения 2025 Т. 5 № 4 С. 61–74

Under modern conditions, the introduction of artificial intelligence technologies is becoming a significant factor in the development of high-tech industries. The article presents the results of a study of the prospects for the use of intelligent analytical systems in nuclear energy. The experience of foreign countries is analyzed and the features of successful projects using ...

Added: June 5, 2025

Periods of high uncertainty: How fertility intentions in Russia changed during 2022–2023

Vakulenko E., Gorskiy D., Kondrateva V. et al., Demographic Research 2025 Vol. 52 P. 939–970

BACKGROUND We study fertility intentions change in Russia, during the period of socio-economic shocks in 2022-2023, in response to the Russia-Ukraine armed conflict. OBJECTIVE Our objective is to identify factors that influence decision-making in a low fertility context during the crisis, including both objective characteristics and subjective assessment of the current situation. METHODS This paper is based on unique survey ...

Added: May 6, 2025

Prospects for Big Text Data Application in Technology Maturity Assessment (Publications Review)

Loginova I., Grozovskiy F., Aksenova A., Automatic Documentation and Mathematical Linguistics 2025 Vol. 59 No. 3 P. 145–153

The paper analyzes the limitations of conventional methods for assessing the maturity of technology, such as the S-curve, technology readiness level (TRL), Gartner’s hype cycle and their dependence on experts’ opinions. Current approaches to this task based on big text data analysis and machine learning algorithms are reviewed, and their advantages are demonstrated. As a ...

Added: April 28, 2025

Application of Physics-Informed Neural Networks for Solving the Inverse Advection-Diffusion Problem to Localize Pollution Sources

Derkach D., Efremenko D., Чупров И. А. et al., / Series Computer Science "arxiv.org". 2025. No. 2503.18849.

Added: March 25, 2025

Generative models and seq2seq techniques for the flash-simulation of the LHCb experiment

Derkach D., Anderlini L., Capelli S. et al., Proceedings of Science 2025 Vol. 476 P. 1032

Simulating detector and reconstruction effects on physics quantities is crucial for data analysis, but it is coming unsustainably costly for the upcoming HEP experiments. The most radical approach to speed-up detector simulation is Flash Simulation, as proposed by the LHCb collaboration in Lamarr, a software package implementing a novel simulation paradigm relying on Deep Generative ...

Added: March 13, 2025

Real-bogus scores for active anomaly detection

Semenikhin T. A., Kornilov M., Pruzhinskaya M. et al., Astronomy and Computing 2025 Vol. 51 Article 100919

In the task of anomaly detection in modern time-domain photometric surveys, the primary goal is to identify astrophysically interesting, rare, and unusual objects among a large volume of data. Unfortunately, artifacts — such as plane or satellite tracks, bad columns on CCDs, and ghosts — often constitute significant contaminants in results from anomaly detection analysis. ...

Added: March 3, 2025

Exploring the Universe with SNAD: Anomaly Detection in Astronomy

Volnova A., Aleo P., Lavrukhina A. et al., Communications in Computer and Information Science 2024 Vol. 2086 P. 195–208

SNAD is an international project with a primary focus on detecting astronomical anomalies within large-scale surveys, using active learning and other machine learning algorithms. The work carried out by SNAD not only contributes to the discovery and classification of various astronomical phenomena but also enhances our understanding and implementation of machine learning techniques within the ...

Added: March 3, 2025

SNAD catalogue of M-dwarf flares from the Zwicky Transient Facility

Voloshina A., Lavrukhina A., Pruzhinskaya M. et al., Monthly Notices of the Royal Astronomical Society 2024 Vol. 533 No. 4 P. 4309–4323

Most of the stars in the Universe are M spectral class dwarfs, which are known to be the source of bright and frequent stellar flares. In this paper, we propose new approaches to discover M-dwarf flares in ground-based photometric surveys. We employ two approaches: a modification of a traditional method of parametric fit search and ...

Added: March 3, 2025

Сравнительный анализ моделей прогнозирования региональной инфляции

Габов М. А., Bukina T. V., Kashin D., Журнал Новой экономической ассоциации 2025 № 4(69) С. 87–117

The study aims to compare approaches to forecasting the monthly level of consumer price index (CPI y/y) in the regions of the Volga Federal District using time series models and machine learning methods. This study attempts to select the most appropriate and efficient models for predicting the regional general price level index. The paper also ...

Added: February 22, 2025

Performance Modeling of Data Storage Systems using Generative Models

Al-Maeeni A. R., Temirkhanov A., Ryzhikov A. et al., IEEE Access 2025 Vol. 13 P. 49643–49658

High-precision systems modeling is one of the main areas of industrial data analysis. Models of systems, their digital twins, are used to predict their behavior under various conditions. In this study, we developed several models of a storage system using machine learning-based generative models to predict performance metrics such as IOPS and latency. The models ...

Added: February 20, 2025

Classification of Long Gamma-Ray Transients from INTEGRAL Data Using Machine Learning Approach

Mozgunov G., Pozanenko A. S., Minaev P. Y. et al., , in: Data Analytics and Management in Data Intensive Domains: 25th International Conference, DAMDID/RCDL 2023, Moscow, Russia, October 24–27, 2023, Revised Selected PapersVol. 2086: Communications in Computer and Information Science.: Springer, 2024. P. 215–224.

Added: January 31, 2025

Model for Assessing the Liquidity of a Stock Market Trading Instrument

Sizykh D., Tregub K., Belyakov B. et al., , in: 2024 17th International Conference on Management of Large-Scale System Development (MLSD).: IEEE, 2024. P. 1–5.

Currently, a large number of studies are being conducted to improve the accuracy of the developed forecasting methods for the stock market. At the same time, multivariate models based on machine learning methods are increasingly used. Since liquidity indicators have a significant impact on asset pricing, taking them into account can improve the accuracy of ...

Added: January 15, 2025