A Protein Classification Benchmark collection for Machine Learning

Publications

?

A Protein Classification Benchmark collection for Machine Learning

Nucleic Acids Research. 2006.

Kertesz-Farkas A., Pongor S.

Priority areas: IT and mathematics

Language: English

DOI

Text on another site

Keywords: bioinformatics

Multimodal graph, surface, and language-based model for protein protein interaction prediction

Arteaga Moreano B. D., Poptsova M., Scientific Reports 2026 No. 16 Article 4772

Accurate prediction of protein-protein interactions (PPIs) is fundamental to understanding biological processes and disease mechanisms. While deep learning offers a powerful alternative to costly experimental methods, existing approaches often overlook critical protein-surface information and rely on simplistic feature fusion techniques, thereby limiting performance. To address this, we introduce GSMFormer-PPI, a novel multimodal framework that integrates ...

Added: February 4, 2026

Iterative Ricci-Foster Curvature Flow with GMM-Based Edge Pruning: A Novel Approach to Community Detection

Sorokin K., Beketov M., Онучин А. et al., / arxiv.org. Серия cs.SI "Social and Information Networks ". 2025.

Community detection in complex networks is a fundamental problem, open to new approaches in various scientific settings. We introduce a novel community detection method, based on Ricci flow on graphs. Our technique iteratively updates edge weights (their metric lengths) according to their (combinatorial) Foster version of Ricci curvature computed from effective resistance distance between the ...

Added: January 15, 2026

Implementing Transport Coding in OMNeT++ for Message Delay Reduction

Petrovanov I., Sergeev A., / Series Computer Science "arxiv.org". 2025. No. 2512.18332.

Transport coding reduces message delay in packet-switched networks by introducing controlled redundancy at the transport layer: original packets are encoded into coded packets, and the message is reconstructed after the first successful deliveries, effectively shifting latency from the maximum packet delay to the -th order statistic. We present a concise, reproducible discrete-event implementation of transport coding in OMNeT++, including ...

Added: December 24, 2025

Prediction of protein-protein interactions using point transformer and spherical Convex Hull graphs

David Arteaga, Poptsova M., Computational and Structural Biotechnology Journal 2026 Vol. 31 P. 82–93

Accurate predictions and large-scale identification of protein-protein interactions (PPIs) are crucial for understanding their inherent biological mechanisms and protein functions in virtually all biological processes. Nowadays, graph-based deep learning models have made significant contributions in modeling proteins with physicochemical and geometric features. However, most of these models rely on conventional graph construction methods, such as ...

Added: December 22, 2025

Hessian-based lightweight neural network for brain vessel segmentation on a minimal training dataset

Меньшиков И. А., Бернадотт А. К., Elvimov N. S., / Series arXie "Statistical mechanics". 2025.

Accurate segmentation of blood vessels in brain magnetic resonance angiography (MRA) is essential for successful surgical procedures, such as aneurysm repair or bypass surgery. Currently, annotation is primarily performed through manual segmentation or classical methods, such as the Frangi filter, which often lack sufficient accuracy. Neural networks have emerged as powerful tools for medical image ...

Added: December 1, 2025

Determining the boundary of dynamical chaos in the generalized Chirikov map via machine learning

Чернышов Д. П., Satanin A., Shchur L., / Series arXiv "math". 2025.

We investigate the boundary separating regular and chaotic dynamics in the generalized Chirikov map, an extension of the standard map with phase-shifted secondary kicks. Lyapunov maps were computed across the parameter space (K,K(α, τ)) and used to train a convolutional neural network (ResNet18) for binary classification of dynamical regimes. The model reproduces the known critical ...

Added: November 21, 2025

Эффективный алгоритм торговли на фондовом рынке: ретроспективный анализ, основанный на данных по S&P-500.

Rubchinskiy A., Chubarova D., / Series WP7 "Математические методы анализа решений в экономике, бизнесе и политике". 2025. No. WP7/2025/01.

The article examines one of the most famous examples of socio-economic systems, characterized by significant uncertainty – the S&P-500 stock market, where shares of 500 largest US companies are traded. No assumptions are made about the probabilistic characteristics of the stock market. A flexible algorithm for daily trading has been developed, based on both known fixed data ...

Added: November 9, 2025

Diffusion on language model embeddings for protein sequence generation

Meshchaninov V., Strashnov, P., Shevtsov A. et al., / Cornell University. Серия CoRR, arXiv:2403.03726 "Computing Research Repository,". 2025.

Protein design requires a deep understanding of the inherent complexities of the protein universe. While many efforts lean towards conditional generation or focus on specific families of proteins, the foundational task of unconditional generation remains underexplored and undervalued. Here, we explore this pivotal domain, introducing DiMA, a model that leverages continuous diffusion on embeddings derived ...

Added: October 5, 2025

Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation

Shabalin A., Meshchaninov V., Vetrov D., / Series cs.CL, arXiv:2505.18853 "Computation and Language". 2025.

Diffusion models have achieved state-of-the-art performance in generating images, audio, and video, but their adaptation to text remains challenging due to its discrete nature. Prior approaches either apply Gaussian diffusion in continuous latent spaces, which inherits semantic structure but struggles with token decoding, or operate in categorical simplex space, which respect discreteness but disregard semantic ...

Added: October 5, 2025

A Feature Engineering Framework for Computer Vision Based on Topological Data Analysis

Абрамов А. С., Chernyshev V. L., Mikhaylets E. et al., / Series Social Science Research Network "Social Science Research Network". 2025.

Computer vision is one of the most relevant modern research areas with broad practical applications. However, traditional solutions based on deep learning have signicant limitations and can be misleading. Topological data analysis, on the other hand, is a modern approach to solving similar problems using mathematically deterministic methods of algebraic topology that reduce the risk ...

Added: September 23, 2025

Utilizing the VirIdAl Pipeline to Search for Viruses in the Metagenomic Data of Bat Samples

Budkina A., Korneenko E., Kotov I. et al., Viruses 2021 No. 10 P. 2006

According to various estimates, only a small percentage of existing viruses have been discovered, naturally much less being represented in the genomic databases. High-throughput sequencing technologies develop rapidly, empowering large-scale screening of various biological samples for the presence of pathogen-associated nucleotide sequences, but many organisms are yet to be attributed specific loci for identification. This ...

Added: September 19, 2025

On the construction of frieze patterns from partitions of convex polygons by nonintersecting diagonals

Kochetkov Y., / Series arXiv.org e-print archive "arXiv.math". 2025. No. 07600.

We demonstrate in an elementary way how to construct a frieze pattern of width m-3 from a partition of a convex m-gon by not intersecting diagonals. ...

Added: September 17, 2025

On one property of Catalan numbers

Kochetkov Y., / Series arXiv.org e-print archive "arXiv.math". 2025. No. 20584.

We give a new proof of the following statement: the Catalan number C_n is divisible by n+2, if n is odd and n<> 3k+1. ...

Added: September 9, 2025

TabGraphs: A Benchmark and Strong Baselines for Learning on Graphs with Tabular Node Features

Bazhenov G., Платонов О. А., Prokhorenkova L., / Series arXiv:2409.14500 "arXiv:2409.14500 [cs.LG]". 2025.

Tabular machine learning is an important field for industry and science. In this f ield, table rows are typically treated as independent data samples, but additional information about the relations between these samples is sometimes available and can be used to improve predictive performance. Such information can be naturally modeled with a graph, hence tabular ...

Added: August 14, 2025

Low Sets and Closure Properties of Counting Function Classes

Ivanashev Y., / Series Computer Science "arxiv.org". 2025.

Added: July 29, 2025

ComputAgeBench: Epigenetic Aging Clocks Benchmark

Dudkovskaia Anastasiia, / Series 005140 "Biorxiv". 2025.

The success of clinical trials of longevity drugs relies heavily on identifying integrative health and aging biomarkers, such as biological age. Epigenetic aging clocks predict the biological age of an individual using their DNA methylation profiles, commonly retrieved from blood samples. However, there is no standardized methodology to validate and compare epigenetic clock models as ...

Added: July 18, 2025

Transcriptomic Maps of Colorectal Liver Metastasis: Machine Learning of Gene Activation Patterns and Epigenetic Trajectories in Support of Precision Medicine

KUDRYAVTSEVA A., Cancers 2023

Liver metastasis is a significant factor contributing to mortality associated with colorectal cancer. Establishing the biological mechanisms of metastasis is crucial for refining diagnostics and identifying therapeutic windows for interventions. Currently, little is known of the processes that govern the development of liver metastases, the role of the tumor microenvironment, the role of epigenetics, and ...

Added: July 1, 2025

An archaic reference-free method to jointly infer Neanderthal and Denisovan introgressed segments in modern human genomes

Planche L., Ilina A., Ávila-Arcos M. et al., / Series 005140 "Biorxiv". 2025.

Admixture between populations is a common feature of human history. Admixture events introduce new genetic variation that can fuel evolution. Characterizing the significance of admixture events on the evolution of a population across various species is of great interest to evolutionary geneticists. Local Ancestry Inference (LAI) methods infer genetic ancestry of an individual at a ...

Added: May 19, 2025

Genome-wide association studies of ischemic stroke based on interpretable machine learning

Stefan Nikolić, Ignatov D. I., Khvorykh G. et al., PeerJ Computer Science 2024 Vol. 10 Article e2454

Despite the identification of several dozen genetic loci associated with ischemic stroke (IS), the genetic bases of this disease remain largely unexplored. In this research we present the results of genome-wide association studies (GWAS) based on classical statistical testing and machine learning algorithms (logistic regression, gradient boosting on decision trees, and tabular deep learning model ...

Added: December 11, 2024

Proceedings of Science, volume 429. The 6th International Workshop on Deep Learning in Computational Physics

[б.и.], 2023.

The Workshop will be held in the Meshcheryakov Laboratory of Information Technologies (MLIT) of the Joint Institute for Nuclear Research (JINR) on July 6-8, 2022. The workshop primarily focuses on the use of machine learning in particle astrophysics and high energy physics, but is not limited to this area. Topics of interest are various applications of ...

Added: March 12, 2024

Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals

Pavel Latyshev, Fedor Pavlov, Herbert A. et al., , in: Proceedings of 11th Moscow Conference on Computational Molecular Biology MCCMB'23.: IITP RAS, 2023.

Added: December 1, 2023

Proceedings of 11th Moscow Conference on Computational Molecular Biology MCCMB'23

IITP RAS, 2023.

В сборнике представлены тезисы работ участников 11-ой Московской конференции по вычислительной молекулярной биологии MCCMB'23. Работы посвящены актуальным вопросам анализа аминокислотных и нуклеотидных последовательностей, структур биополимеров, молекулярной эволюции, методов высокопроизводительного секвенирования, системной биологии и биоалгоритмов. ...

Added: November 30, 2023

10th International Conference, PReMI 2023, Kolkata, India, December 12–15, 2023, Proceedings. Pattern Recognition and Machine Intelligence. LNCS, volume 14301

Cham: Springer, 2023.

Added: November 29, 2023

Cross-Domain Limitations of Neural Models on Biomedical Relation Classification

Alimova I., Tutubalina E., Nikolenko S. I., IEEE Access 2022 Vol. 10 P. 1432–1439

Relation extraction (RE) aims to extract relational facts from plain text, which is essential to the biomedical research field with the rapid growth of biomedical literature and generally large volumes of biomedicine-related text coming from various sources. Numerous annotated corpora and state-of-the-art models have been introduced in the past five years. However, there are no ...

Added: April 10, 2023