Randomness in Cancer Breakpoint Prediction

K. Cheloshkina; Bzhikhatlov I.; M. Poptsova

doi:10.1089/cmb.2020.0551

Publications

?

Randomness in Cancer Breakpoint Prediction

Journal of Computational Biology. 2021. Vol. 28. No. 7. P. 716–731.

Cheloshkina K., Bzhikhatlov I., Poptsova M.

Cancer genomes are susceptible to multiple rearrangements by deleting, inserting, and translocating genomic regions. Recently, the problem of finding determinants of breakpoint formations was approached with machine learning methods; however, unlike cancer point mutations, breakpoint prediction appeared to be a more difficult task, and various machine learning models did not achieve high prediction power often slightly exceeding the threshold of random guessing. This raised the question of whether the breakpoints are random noise in cancer mutagenesis or there exist determinants in structural mutagenesis. In the present study, we investigated randomness in cancer breakpoint genome distributions through the power of machine learning models to predict breakpoint hot spots. We divided all cancer types into three groups by degree of randomness in their breakpoint formation. We tested different density thresholds and explored the bias in hot spot definition. We also compared prediction of hot spots versus individual breakpoints. We found that hot spots are considerably better predicted than individual breakpoints; however, some individual breakpoints can also be predicted with a satisfactory power, and thus, it is not proper to filter them from analyses. We demonstrated that positive-unlabeled learning can provide insights into insufficiency of cancer data sets, which are not always reflected by data set sizes. Overall, the present results support the view that cancer breakpoint landscape can be represented by predictable dense breakpoint regions and scattered individual breakpoints, which are not all random noise, but some are generated by detectable mechanism.

Research target: Biology Basic Medicine

Priority areas: IT and mathematics

Language: English

DOI

Keywords: машинное обучение machine learning рак геномика случайный лес cancer Машинное обучение в Биоинформатике cancer breakpoints Cancer genome rearrangements Cancer breakpoint hotspots random forest геномыне перестановки разрывы в раковых геномах

Publication based on the results of:

Аnalysis of regulatory alternative DNA structures (2021)

WSI-GT: Pseudo-Label Guided Graph Transformer for Whole-Slide Histology

Михайлов И. А., Machine Learning and Knowledge Extraction 2026 Vol. 8 No. 1 Article 8

Whole-slide histology images (WSIs) can exceed 100 k × 100 k pixels, making direct pixel-level segmentation infeasible and requiring patch-level classification as a practical alternative for downstream WSI segmentation. However, most approaches either treat patches independently, ignoring spatial and biological context, or rely on deep graph models prone to oversmoothing and loss of local tissue ...

Added: July 16, 2026

Phase-1 study of vamotinib (PF-114), a 3rd generation BCR::ABL1 tyrosine kinase-inhibitor, in chronic myeloid leukaemia

Михайлов И. А., Annals of Hematology 2025 Vol. 104 P. 2707–2715

Vamotinib (PF-114) is a 3rd -generation, ATP-competitive oral tyrosine kinase inhibitor (TKI) active against wild-type and mutated BCR::ABL1 isoforms including BCR::ABL1T315I. We present final results of a phase-1 vamotinib dose-escalation study to identify maximum tolerated dose (MTD) and dose-limiting toxicity (DLT) followed by expansion cohorts. 51 subjects with chronic myeloid leukaemia (CML) failing ≥ 1 2nd generation TKI or with BCR::ABL1T315I were ...

Added: July 16, 2026

Особенности потенциала, следующего за частотой, зарегистрированного в ответ на изменения в параметрах звучания речевого сигнала (изолированного слога)

Окнина Л. Б., Подлепич В. В., Sieber I. et al., Физиология человека 2026 Т. 52 № 3 С. 5–15

The aim of this study was to evaluate the frequency following response (FFR) for the analysis of the intact perception of syllables in humans. For this purpose, event-related potentials (ERPs) in response to isolated Russian syllables were recorded and analyzed in 29 healthy persons in two frequency ranges, standard (1-40 Hz) and high frequency (70-150 ...

Added: June 30, 2026

Сравнение методов автоматической разметки речевых формул в русскоязычном интернет-дискурсе: пилотное исследование

Попова Т. И., Масленикова А. С., В кн.: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Выпуск 24.Issue 24.: M.: Max press, 2026. С. 420–429.

This study focuses on developing and comparing methods for automatic annotation of speech formulas in a corpus of Russian internet comments. Speech formulas are a class of multiword expressions that convey emotional reactions in dialogue. The research material consisted of a corpus of 10,000 comments (157,261 tokens) collected from five Telegram channels. Dictionary-based formal search ...

Added: June 29, 2026

The Use of the Missing Sample Simulation Modeling to Create a Classification Model for Three or More Classes by the Example of the Carbohydrate Metabolism Disorder Degree Detection Problem

Новиков Р. С., Novopashin M., Pozin B., Programming and Computer Software 2026 Vol. 52 No. 1 P. 28 – 38

Added: June 26, 2026

Growth in noncommutative algebras and entropy in derived categories

Piontkovski D., / Series arXiv "math". 2026.

A noncommutative projective variety is defined, following Artin and Zhang, by a graded coherent algebra 𝐴. The category of coherent sheaves is then the quotient qgr(𝐴) of the category of finitely presented graded modules by the subcategory of torsion modules. We consider the categorical and polynomial entropies of the Serre twist, that is, of the ...

Added: June 23, 2026

Multilinear nilalgebras and the Jacobian theorem

Piontkovski D., / Series arXiv "math". 2025.

If a symmetric multilinear algebra is weakly nil, then it is Engel. This result may be regarded as an infinite-dimensional analogue of the well-known Jacobian theorem, which states that if a polynomial mapping has a polynomial inverse, then its Jacobian matrix is invertible. This refines a theorem of Gerstenhaber and partially answers a question posed ...

Added: June 23, 2026

Эволюция, экология, генетика человека, медицинская паразитология. Практикум.

Ryskina H., Лобаева Т. А., Ануркина А. И., Ваш формат, 2025.

Я Эволюция, экология, генетика человека, медицинская паразитология. Практикум. Аннотация Аннотация к практикуму «Эволюция, экология, генетика человека, медицинская паразитология» Учебное пособие «Эволюция, экология, генетика человека, медицинская паразитология. Практикум» (авторы: Лобаева Т. А., Рыскина Е. А., Ануркина А. И.; издательство «Ваш формат», 2025 год, 194 страницы) предназначено для студентов высших учебных заведений. Цель практикума — обеспечить комплексное освоение ключевых разделов биологии с акцентом на их прикладное значение в медицине. ...

Added: June 23, 2026

Zα and Zβ Localize ADAR1 to Flipons That Modulate Innate Immunity, Alternative Splicing, and Nonsynonymous RNA Editing

Herbert A., Cherednichenko O., Lybrand T. et al., International Journal of Molecular Sciences 2025 Vol. 26 No. 6 Article 2422

The double-stranded RNA editing enzyme ADAR1 connects two forms of genetic programming, one based on codons and the other on flipons. ADAR1 recodes codons in pre-mRNA by deaminating adenosine to form inosine, which is translated as guanosine. ADAR1 also plays essential roles in the immune defense against viruses and cancers by recognizing left-handed Z-DNA and ...

Added: June 22, 2026

К ранжированию значимости факторов дестабилизации в странах Азии и Африки методами машинного обучения

Korotayev A., Chernomorchenko I., Медведев И. А., Восток. Афро-азиатские общества: история и современность 2026 № 3 С. 117–130

This study employs machine learning methods to rank factors contributing to large-scale armed and unarmed destabilization across Asian and African countries. Analysis reveals that African nations demonstrate greater vulnerability to armed destabilization (up to full-scale civil wars), whereas Asian countries are more prone to less violent unarmed forms (mass antigovernment demonstrations, riots, general strikes and ...

Added: June 21, 2026

Embryonic enhancers help transmit positional information to the initiator cores that control Drosophila Abd-B regulatory domains

Kyrchanova O., Kudryashova K., Ibragimov A. et al., Development 2025 Vol. 152 No. 19 P. 1–12

Drosophila homeotic gene Abdominal-B (Abd-B) is controlled throughout development by four infraabdominal (iab) regulatory domains, the active or repressed state of which is determined by initiators that have parasegment-specific enhancer activity at an early stage of embryonic development. For this reason, it has long been assumed that the enhancer activity and initiation function of these elements are synonymous. ...

Added: June 20, 2026

Initiators counteract Polycomb repression and stimulate long-range contacts between enhancers and the Abdominal-B promoter in Drosophila

Kyrchanova O., Ksenia Kudryashova, Dubrovskaya V. et al., Open Biology 2025 Vol. 15 No. 11 Article 250192

The specification of abdominal segments A5 to A9 depends on the expression of Abdominal-B (Abd-B), which is regulated by four infraabdominal domains: iab-5 through iab-8,9. Each iab domain contains an initiation element that determines its active state, along with enhancers responsible for tissue-specific activation of Abd-B. These iab domains function autonomously due to their flanking boundaries, Fab-6, Fab-7 and Fab-8, which both block crosstalk between adjacent iab domains (insulator function) and facilitate long-range ...

Added: June 20, 2026

Construction of Promoter Elements for Strong, Moderate, and Weak Gene Expression in Drosophila melanogaster

Kudryashova K.S., Deriglazova I. O., Osadchiy I. S. et al., Genes 2025 Vol. 16 No. 1 Article 3

Background/Objectives: Transcriptional promoters play an essential role in regulating protein expression. Promoters with weak activity generally lead to low levels of expression, resulting in fewer proteins being produced. At the same time, strong promoters are commonly used in studies using transgenic organisms as model systems. This approach can have various negative consequences for the organism, ...

Added: June 20, 2026

Diverse β-bungarotoxin isoforms manifest different affinities to voltage-gated potassium channels of Kv1.x subfamily

Ksenia Kudryashova, Filippova E., Kryukova E. et al., Archives of Biochemistry and Biophysics 2025 Vol. 769 Article 110437

β-Bungarotoxins (β-BuTx), consisting of covalently bound phospholipase A2 subunit (A-chain), a member of group Ia of secretory phospholipases A2, and non-enzymatic subunit (B-chain) structurally related to Kunitz-type protease inhibitors, block presynaptic neuromuscular transmission via a not completely defined mechanism of action. In vivo physiological studies revealed that the B-chain is targeting voltage-gated potassium channels of not identified subtypes. In our work, six β-BuTx isoforms were ...

Added: June 20, 2026

Benchmarking DNA large language models on quadruplexes

Cherednichenko O., Herbert A., Poptsova M., Computational and Structural Biotechnology Journal 2025 Vol. 27 P. 992–1000

Large language models (LLMs) in genomics have successfully predicted various functional genomic elements. While their performance is typically evaluated using genomic benchmark datasets, it remains unclear which LLM is best suited for specific downstream tasks, particularly for generating whole-genome annotations. Current LLMs in genomics fall into three main categories: transformer-based models, long convolution-based models, and state-space models ...

Added: June 19, 2026

Kolmogorov–Arnold networks for genomic tasks

Cherednichenko O., Poptsova M., Briefings in Bioinformatics 2025 Vol. 26 No. 2 Article bbaf129

Kolmogorov–Arnold networks (KANs) emerged as a promising alternative for multilayer perceptrons (MLPs) in dense fully connected networks. Multiple attempts have been made to integrate KANs into various deep learning architectures in the domains of computer vision and natural language processing. Integrating KANs into deep learning models for genomic tasks has not been explored. Here, we ...

Added: June 19, 2026

Molecular biology of the deadliest cancer – glioblastoma: what do we know?

Aly Ismailov, Spallone A., Belogurov A. et al., Frontiers in Immunology 2025 Vol. 16 Article 1530305

Glioblastomas are the most prevalent primary brain tumors and are associated with a dramatically poor prognosis. Despite an intensive treatment approach, including maximal surgical tumor removal followed by radio- and chemotherapy, the median survival for glioblastoma patients has remained around 18 months for decades. Glioblastoma is distinguished by its highly complex mechanisms of immune evasion ...

Added: June 17, 2026

Русские тексты рубежа XVII–XVIII вв. в немецком издании «Травника» Маттиоли (Prag, 1563)

Lifshits A., Святохина Е. В., Одиссей: Человек в истории 2026 № 1 С. 156–173

This article introduces a new source for scholarly discussion on the history of Russian language, science, and culture in the late 17th century. Numerous Russian captions for engravings were discovered in a copy of the "Travnik" (Herbal) by the great Renaissance botanist and pharmacist Pietro Andrea Mattioli, published in Prague in German in 1563. In ...

Added: June 15, 2026

Unique morphology and ecology lifestyle of a new craniiform brachiopod from the Upper Ordovician of Estonia (Baltica)

Madison A., Plandin F., Kuzmina T. et al., Palaeogeography, Palaeoclimatology, Palaeoecology 2026 Vol. 697 Article 113864

Unlike most other brachiopods, craniiforms lack a prominent pedicle, cement to a hard substrate by their ventral valve, and thereby can be easily distinguished in paleontological materials. In this study, we describe a new genus and species from the Porkuni Regional Stage of Northeastern Estonia (Baltica), corresponding to the lower part of the Hirnantian Stage ...

Added: June 15, 2026

Does the pedicle exist in craniiform brachiopods?

Plandin F., Temereva E., Zoological Journal of the Linnean Society 2025 Vol. 205 No. 2 Article zlaf139

Although Brachiopoda represent a relatively small phylum, they exhibit significant morphological diversity. One of the central issues in understanding the evolution of the ancestral brachiopod body plan pertains to the homologies among body compartments across the different brachiopod subphyla: Craniiformea, Linguliformea, and Rhynchonelliformea. In this context, the pedicle stands out as one of the most ...

Added: June 15, 2026

Microbial diversity and production of milk spirit using traditional Buryat fermentation and distillation technologies

Namsaraev Z., Nanzatov B., Kozlova A. et al., Scientific Reports 2026 Vol. 16 No. 1 Article 17769

Distilled fermented milk beverages are rare in food technology, despite the global prevalence of plant-based spirits. Currently, the production of distilled strong alcoholic beverages from fermented milk using traditional technologies is known only among Mongolic-speaking peoples and their Siberian neighbors. This study provides the first interdisciplinary analysis of darasun, a traditional Buryat spirit made from fermented ...

Added: June 10, 2026

Artificial intelligence and digital twins for failure prediction in data center cooling systems: a comprehensive literature review (2018–2026)

Butorova A., Bobakov V., Sergeev A. et al., European Physical Journal: Special Topics 2026 P. 1–19

This paper presents a review of artificial intelligence (AI) methods for failure prediction in data center cooling systems, with a focus on the integration of digital twins (DTs), physics-informed learning, and graph-based models. Positioned within complex network science, this review addresses a limitation of conventional graph approaches—their reliance on pairwise connectivity—whereas real-world failures often arise ...

Added: June 10, 2026

Влияние шизофрении на лексический уровень языка

Untila K., Tasenko O., В кн.: Современная лингвистика: ключ к диалогу. Труды и материалы IV Казанского международного лингвистического саммита.Т. 1: СОВРЕМЕННАЯ ЛИНГВИСТИКА: КЛЮЧ К ДИАЛОГУ.: Каз.: Издательство Казанского университета, 2024. С. 221–224.

Шизофрения – это хроническое психическое расстройство, которое выражается как комбинация психотических симптомов – таких как галлюцинации, бред и дезорганизация когнитивных функций. У многих пациентов с диагнозом шизофрения обнаруживаются нарушения речи. Для исследования были отобраны рассказы об истории из жизни из корпуса 3D. В качестве личных историй были собраны ответы на вопросы «Какой самый лучший или запоминающийся ...

Added: June 8, 2026

Video games as stimuli in neuroimaging studies: a minireview

Blank I., Klucharev V., Shestakova A., Frontiers in Human Neuroscience 2026 Vol. 20 Article 1687121

In video games, the participants are active agents who pursue various goals within gaming environments that increasingly resemble real life. As a result, video games are increasingly offering tools for neuroimaging studies aiming to elucidate the neural basis of human perceptual, cognitive, and emotional functions. Here, we review these studies. The first studies used computerized ...

Added: June 6, 2026