• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Book chapter

Machine Learning Applications for Genomic Pattern Recognition Problem

P. 139-148.
Tevanyan E., Poptsova M.

DNAsecondary structures are important functional elements thatmay influence cellular processes. One of theirpossible functions is regulation of nucleosome positioning. Here MNAse-seq and ssDNA-seq data were used to define patterns of positional relationship of DNA structures such as Z-DNA, H-DNA and G-quadruplexes with nucleosomes. Three types of patterns werefound: a structure is surrounded by nucleosomes from both sides, from one side, or nucleosome free region. Machine-learning models based on Random forest algorithm and XGBoost weretrained to recognize DNA region of 500 bp length containing a pattern of nucleosome positioning for three types of DNA struc-tures (Z-DNA, H-DNA and G-quadruplexes) based on DNAsequence composi-tional properties. The best performance (more than 86% for ROC-AUC, accu-racy, recall and presicion scores) wasreached for G-quadruplexes. 500 bp re-gions containing G-quadruplexes have distinct compositional properties and point to the preferential locations of the defined patterns, which regulatory functions require further investigation. For other DNA structures a region com-position is less powerful predictive factor and one should take into account oth-er physical and structural DNA properties to improve nucleosome-DNA-structure pattern recognition.

In book

Edited by: Irina Lomazova, Anna Kalenkova, Р. Яворский. Vol. 2478: CEUR Workshop Proceedings. CEUR-WS.org, 2019.