Cancer Breakpoint Hotspots Versus Individual Breakpoints Prediction by Machine Learning Models

K. Cheloshkina; Bzhikhatlov I.; M. Poptsova

doi:10.1007/978-3-030-57821-3_19

Publications

?

Cancer Breakpoint Hotspots Versus Individual Breakpoints Prediction by Machine Learning Models

P. 217–228.

Cheloshkina K., Bzhikhatlov I., Poptsova M.

Genome rearrangement is a hallmark of all cancers. Cancer breakpoint prediction appeared to be a difficult task, and various machine learning models did not achieve high prediction power. We investigated the power of machine learning models to predict breakpoint hotspots selected with different density thresholds and also compared prediction of hotspots versus individual breakpoints. We found that hotspots are considerably better predicted than individual breakpoints. While choosing a selection criterion, the test ROC AUC only is not enough to choose the best model, the lift of recall and lift of precision should be taken into consideration. Investigation of the lift of recall and lift of precision showed that it is impossible to select one criterion of hotspot selection for all cancer types but there are three to four distinct groups of cancer with similar properties. Overall the presented results point to the necessity to choose different hotspots selection criteria for different types of cancer.

Keywords: machine learning Cancer genome rearrangements Cancer breakpoints Cancer breakpoint hotspots random forest

In book

Proceedings 16th International Symposium, ISBRA 2020, Moscow, Russia, December 1–4, 2020. Lecture Notes in Computer Science

Vol. 12304. , Springer Publishing Company, 2020.

Method of Automated Dataset Collection for Microwave Filters Synthesis

Arinin O. V., Bakhmach D. M., Katsnelson A. et al., , in: 2025 Systems of Signals Generating and Processing in the Field of on Board Communications.: IEEE, 2025. P. 1–5.

This research discusses the method of dataset collection automatization for microwave filter synthesis by integrating machine learning techniques, thus reducing development time. Utilizing the 3D electromagnetic analysis software package, the study involves simulation and collecting geometric parameters and amplitude-frequency characteristics from three variants of passband highly selective microstrip tworesonator combined filters with stepped impedance resonators. ...

Added: December 6, 2025

ОТСЛЕЖИВАНИЕ РАЗВИТИЯ РАЗРУШЕНИЯ С ПОМОЩЬЮ КЛАСТЕРИЗАЦИИ ИМПУЛЬСОВ ТЕРМИЧЕСКИ СТИМУЛИРОВАННОЙ АКУСТИЧЕСКОЙ ЭМИССИИ ПРИ ОТСУТСТВИИ ЛОКАЦИИ

Индаков Г. С., Казначеев П. А., Майбук З. Я. et al., Геофизические исследования 2025 Т. 26 № 2 С. 99–124

The paper studies the clusterability of acoustic emission pulses during high-temperature heating of sandstone sample preliminarily subjected to mechanical loading. Mechanical loading was applied in uniaxial mode up to load close to destructive with appearance of signs of large cracks on the surface. After that, samples were subjected to thermal treatment up to 650 °C ...

Added: September 19, 2025

Rewriting the Rules: LLMs Vs. Traditional ML in University Admissions

Chepikov I., Karpov I., , in: 26th International Conference, AIED 2025, Palermo, Italy, July 22–26, 2025, Proceedings, Part I. Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED.: Springer, 2025. P. 352 – 358.

Modern LLM models such as BERT, ChatGPT, DeepSeek have shown great potential in solving various tasks, including text classification, text generation, analysis and summary of documents. In this paper, we show that these models close to classical ML approaches based on decision trees not only in text processing, but also in processing classical tabular data ...

Added: September 4, 2025

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics

Wien: Association for Computational Linguistics, 2025.

Added: August 26, 2025

Analysis of a Company Model in Conditions of Unstable Demand Using Reinforcement Learning Methods

Delev A., Semakov S., , in: 2025 8th International Conference on Artificial Intelligence and Big Data (ICAIBD).: IEEE, 2025. P. 318–322.

Profit is one of the most important economic indicators of a company’s performance, and for every company it is necessary to allocate resources in such a way as to obtain the maximum possible profit. The profit maximization problem is usually a dynamic optimization problem. This article discusses an approach to solving the production expansion problem ...

Added: August 25, 2025

Прогнозирование цен на золото с использованием алгоритмов нейросетей

Soldatova A., Финансы, деньги, инвестиции 2023 № 4 С. 9–15

The price of gold is the most important economic indicator. Expectations of rising inflation and higher key rates from central banks are driving investor interest in gold around the world. Given the increasing number of factors influencing the dynamics of the gold rate in the world, forecasting gold prices requires new methods and modern technological ...

Added: July 8, 2025

Predicting Systemic Risk in the Russian Financial Sector with Boosting Techniques

Shchepeleva M., Procedia Computer Science 2024 Vol. 242 P. 51–56

We test the predictive performance of different ensemble methods for forecasting systemic risk in Russia for the period 2008-2024. In contrast to the existing research on machine learning ensemble techniques, we find that conventional random forest works better for the Russian data. Based on this model, we additionally conduct variable importance analysis. We identify that ...

Added: June 17, 2025

Экономические и социальные аспекты атомной энергетики в условиях развития технологий искусственного интеллекта

Podchufarov A., Galkina A. N., Ванина С. С. et al., Экономика и управление: проблемы, решения 2025 Т. 5 № 4 С. 61–74

Under modern conditions, the introduction of artificial intelligence technologies is becoming a significant factor in the development of high-tech industries. The article presents the results of a study of the prospects for the use of intelligent analytical systems in nuclear energy. The experience of foreign countries is analyzed and the features of successful projects using ...

Added: June 5, 2025

Forecasting Stadium Attendance Using Machine Learning Models: A Case of the National Football League

Пан Ю., Wang F., Studia Sportiva 2024 Vol. 18 No. 2 P. 147–164

Added: May 16, 2025

Periods of high uncertainty: How fertility intentions in Russia changed during 2022–2023

Vakulenko E., Gorskiy D., Kondrateva V. et al., Demographic Research 2025 Vol. 52 P. 939–970

BACKGROUND We study fertility intentions change in Russia, during the period of socio-economic shocks in 2022-2023, in response to the Russia-Ukraine armed conflict. OBJECTIVE Our objective is to identify factors that influence decision-making in a low fertility context during the crisis, including both objective characteristics and subjective assessment of the current situation. METHODS This paper is based on unique survey ...

Added: May 6, 2025

Prospects for Big Text Data Application in Technology Maturity Assessment (Publications Review)

Loginova I., Grozovskiy F., Aksenova A., Automatic Documentation and Mathematical Linguistics 2025 Vol. 59 No. 3 P. 145–153

The paper analyzes the limitations of conventional methods for assessing the maturity of technology, such as the S-curve, technology readiness level (TRL), Gartner’s hype cycle and their dependence on experts’ opinions. Current approaches to this task based on big text data analysis and machine learning algorithms are reviewed, and their advantages are demonstrated. As a ...

Added: April 28, 2025

Application of Physics-Informed Neural Networks for Solving the Inverse Advection-Diffusion Problem to Localize Pollution Sources

Derkach D., Efremenko D., Чупров И. А. et al., / Series Computer Science "arxiv.org". 2025. No. 2503.18849.

Added: March 25, 2025

Generative models and seq2seq techniques for the flash-simulation of the LHCb experiment

Derkach D., Anderlini L., Capelli S. et al., Proceedings of Science 2025 Vol. 476 P. 1032

Simulating detector and reconstruction effects on physics quantities is crucial for data analysis, but it is coming unsustainably costly for the upcoming HEP experiments. The most radical approach to speed-up detector simulation is Flash Simulation, as proposed by the LHCb collaboration in Lamarr, a software package implementing a novel simulation paradigm relying on Deep Generative ...

Added: March 13, 2025

Real-bogus scores for active anomaly detection

Semenikhin T. A., Kornilov M., Pruzhinskaya M. et al., Astronomy and Computing 2025 Vol. 51 Article 100919

In the task of anomaly detection in modern time-domain photometric surveys, the primary goal is to identify astrophysically interesting, rare, and unusual objects among a large volume of data. Unfortunately, artifacts — such as plane or satellite tracks, bad columns on CCDs, and ghosts — often constitute significant contaminants in results from anomaly detection analysis. ...

Added: March 3, 2025

Exploring the Universe with SNAD: Anomaly Detection in Astronomy

Volnova A., Aleo P., Lavrukhina A. et al., Communications in Computer and Information Science 2024 Vol. 2086 P. 195–208

SNAD is an international project with a primary focus on detecting astronomical anomalies within large-scale surveys, using active learning and other machine learning algorithms. The work carried out by SNAD not only contributes to the discovery and classification of various astronomical phenomena but also enhances our understanding and implementation of machine learning techniques within the ...

Added: March 3, 2025

SNAD catalogue of M-dwarf flares from the Zwicky Transient Facility

Voloshina A., Lavrukhina A., Pruzhinskaya M. et al., Monthly Notices of the Royal Astronomical Society 2024 Vol. 533 No. 4 P. 4309–4323

Most of the stars in the Universe are M spectral class dwarfs, which are known to be the source of bright and frequent stellar flares. In this paper, we propose new approaches to discover M-dwarf flares in ground-based photometric surveys. We employ two approaches: a modification of a traditional method of parametric fit search and ...

Added: March 3, 2025

Сравнительный анализ моделей прогнозирования региональной инфляции

Габов М. А., Bukina T. V., Kashin D., Журнал Новой экономической ассоциации 2025 № 4(69) С. 87–117

The study aims to compare approaches to forecasting the monthly level of consumer price index (CPI y/y) in the regions of the Volga Federal District using time series models and machine learning methods. This study attempts to select the most appropriate and efficient models for predicting the regional general price level index. The paper also ...

Added: February 22, 2025

Performance Modeling of Data Storage Systems using Generative Models

Al-Maeeni A. R., Temirkhanov A., Ryzhikov A. et al., IEEE Access 2025 Vol. 13 P. 49643–49658

High-precision systems modeling is one of the main areas of industrial data analysis. Models of systems, their digital twins, are used to predict their behavior under various conditions. In this study, we developed several models of a storage system using machine learning-based generative models to predict performance metrics such as IOPS and latency. The models ...

Added: February 20, 2025

Classification of Long Gamma-Ray Transients from INTEGRAL Data Using Machine Learning Approach

Mozgunov G., Pozanenko A. S., Minaev P. Y. et al., , in: Data Analytics and Management in Data Intensive Domains: 25th International Conference, DAMDID/RCDL 2023, Moscow, Russia, October 24–27, 2023, Revised Selected PapersVol. 2086: Communications in Computer and Information Science.: Springer, 2024. P. 215–224.

Added: January 31, 2025

Model for Assessing the Liquidity of a Stock Market Trading Instrument

Sizykh D., Tregub K., Belyakov B. et al., , in: 2024 17th International Conference on Management of Large-Scale System Development (MLSD).: IEEE, 2024. P. 1–5.

Currently, a large number of studies are being conducted to improve the accuracy of the developed forecasting methods for the stock market. At the same time, multivariate models based on machine learning methods are increasingly used. Since liquidity indicators have a significant impact on asset pricing, taking them into account can improve the accuracy of ...

Added: January 15, 2025

Artificial Neural Networks as a Natural Tool in Solution of Variational Problems in Hydrodynamics

Litvinenko N., IEEE Access 2024

Added: December 9, 2024

Может ли искусственный интеллект прогнозировать решения суда? Систематический обзор международных исследований

Kazun A., Мониторинг общественного мнения: Экономические и социальные перемены 2024 № 5 С. 100–122

Advancements in artificial intelligence technologies and the emergence of open databases containing judicial decisions have led to rapid improvements in algorithms capable of classifying legal documents and forecasting decisions made by judges. This article examines a body of international research dedicated to the question of how accurately AI can predict judges’ decisions, and consequently, whether ...

Added: November 29, 2024

Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. European Conference, ECML PKDD 2024, Vilnius, Lithuania, September 9–13, 2024, Proceedings, Part X. LNCS, volume 14950

Cham: Springer, 2024.

This multi-volume set, LNAI 14941 to LNAI 14950, constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2024, held in Vilnius, Lithuania, in September 2024. ...

Added: November 22, 2024