Recognizing Patterns of Nucleosome and DNA Structures Positioning
Non-B DNA structures have a great potential to form and influence various genomic processes including transcription. One of the mechanisms of transcription regulation is nucleo- some positioning. Even though only B-DNA can be wrapped around a nucleosome, non-B DNA structures can compete with a nucleosome for a genomic location. Here we used perman- ganate/S1 nuclease footprinting data on non-B DNA structures, such as Z-DNA, H-DNA, G- quadruplexes and stress-induced duplex destabilization (SIDD) sites, together with MNase-seq data on nucleosome positioning in the mouse genome. We found three types of patterns of nucleosome positioning around non-B DNA structures: a structure is surrounded by nucleo- somes from both sides, from one side, or nucleosome free region. Machine learning models based on random forest and XGBoost algorithms were constructed to recognize DNA regions of 1kB length containing a particular pattern of nucleosome positioning for four types of DNA structures (Z-DNA, H-DNA, G-quadruplexes and SIDD sites) based on statistics of di- and tri- nucleotides. The best performance (94% of accuracy) was reached for G-quadruplexes while for other types of structures the accuracy was under 70%. We conclude that 1kB regions con- taining G-quadruplexes have distinct compositional properties, and this fact points to preferen- tial locations of such pattern in the genome and requires further investigation. For other DNA structures a region composition is not a sufficient predictive factor and one should take into account other physical and structural DNA properties to improve nucleosome-DNA-structure pattern recognition.