Computational methods to predict Z-DNA regions are in high demand to understand the functional role of Z-DNA. The previous state-of-the-art method Z-Hunt is based on statistical mechanical and energy considerations about B- to Z-DNA transition using sequence information. Z-DNA CHiP-seq experiment results showed little overlap with Z-Hunt predictions implying that sequence information only is not sufficient to explain emergence of Z-DNA at different genomic locations. Adding epigenetic and other functional genomic mark-ups to DNA sequence level can help revealing the functional Z-DNA sites. Here we take advantage of the deep learning approach that can analyze and extract information from large volumes of molecular biology data. We developed a machine learning approach DeepZ that aggregates information from genome-wide maps of epigenetic markers, transcription factor and RNA polymerase binding sites, and chromosome accessibility maps. With the developed model we not only verify the experimental Z-DNA predictions, but also generate the whole-genome annotation, introducing new possible Z-DNA regions, which have not yet been found in experiments and can be of interest to the researchers from various fields.
In this paper, we present a new Python library called mPyPl, which is intended to simplify complex data processing tasks using a functional approach. This library defines operations on lazy data streams of named dictionaries represented as generators (so-called multi-field datastreams), and allows enriching those data streams with more ’fields’ in the process of data preparation and feature extraction. Thus, most data preparation tasks can be expressed in the form of a neat linear ’pipeline’, similar in syntax to UNIX pipes, or |> functional composition operator in F#. We define basic operations on multi-field data streams, which resemble classical monadic operations, and show similarity of the proposed approach to monads in functional programming. We also show how the library was used in complex deep learning tasks of event detection in video, and discuss different evaluation strategies that allow for different compromises in terms of memory and performance.