Sequence matching algorithms and paring of noncoding RNAs
A new statistical approach to alignment (finding the longest common subsequence) of two random RNA-type sequences is proposed. We have constructed a generalized ‘dynamic programming’ algorithm for finding the extreme value of the free energy of two noncoding RNAs. In our procedure, we take into account the binding free energy of two random heteropolymer chains which are capable of forming the cloverleaf-like spatial structures typical for RNA molecules. The algorithm is based on two observations: (i) the standard alignment problem can be considered as a zero-temperature limit of a more general statistical problem of binding of two associating heteropolymer chains; (ii) this last problem can be generalized naturally to consider sequences with hierarchical cloverleaf-like structures (i.e. of RNA type). The approach also permits us to perform a ‘secondary structure recovery’. Namely, we can predict the optimal secondary structures of interacting RNAs in a zero-temperature limit knowing only their primary sequences.