Recent advances in single-cell genomics provide an alternative to largely gene-centric metagenomics studies, enabling whole-genome sequencing of uncultivated bacteria. However, single-cell assembly projects are challenging due to (i) the highly nonuniform read coverage and (ii) a greatly elevated number of chimeric reads and read pairs. While recently developed single-cell assemblers have addressed the former challenge, methods for assembling highly chimeric reads remain poorly explored. We present algorithms for identifying chimeric edges and resolving complex bulges in de Bruijn graphs, which significantly improve single-cell assemblies. We further describe applications of the single-cell assembler SPAdes to a new approach for capturing and sequencing "microbial dark matter" that forms small pools of randomly selected single cells (called a mini-metagenome) and further sequences all genomes from the mini-metagenome at once. On single-cell bacterial datasets, SPAdes improves on the recently developed E+V-SC and IDBA-UD assemblers specifically designed for single-cell sequencing. For standard (cultivated monostrain) datasets, SPAdes also improves on A5, ABySS, CLC, EULER-SR, Ray, SOAPdenovo, and Velvet. Thus, recently developed single-cell assemblers not only enable single-cell sequencing, but also improve on conventional assemblers on their own turf. SPAdes is available for free online download under a GPLv2 license
Transitions between different conformational states are ubiquitous in proteins. A vast class of conformation-changing proteins includes evolutionary switches, which vary their conformation as an effect of few mutations or weak environmental variations. However, modeling those processes is extremely difficult due to the need of efficiently exploring a vast conformational space to look for the actual transition path. In this study, we report a strategy that simplifies this task attacking the complexity on several sides. We first apply a minimalist coarse-grained model to the protein, based on an empirical force field with a partial structural bias toward one or both the reference structures. We then explore the transition paths by means of stochastic molecular dynamics and select representative structures by means of a principal path-based clustering algorithm. We finally compare this trajectory with that produced by independent methods adopting a morphing-oriented approach. Our analysis indicates that the minimalist model returns trajectories capable of exploring intermediate states with physical meaning, retaining a very low computational cost, which can allow systematic and extensive exploration of the multistable proteins transition pathways.
One of the key advances in genome assembly that has led to a significant improvement in contig lengths has been improved algorithms for utilization of paired reads (mate-pairs). While in most assemblers, mate-pair information is used in a post-processing step, the recently proposed Paired de Bruijn Graph (PDBG) approach incorporates the mate-pair information directly in the assembly graph structure. However, the PDBG approach faces difficulties when the variation in the insert sizes is high. To address this problem, we first transform mate-pairs into edge-pair histograms that allow one to better estimate the distance between edges in the assembly graph that represent regions linked by multiple mate-pairs. Further, we combine the ideas of mate-pair transformation and PDBGs to construct new data structures for genome assembly: pathsets and pathset graphs.
Bacteria are known to exchange genetic information by horizontal gene transfer. Since the frequency of homologous recombination depends on the similarity between the recombining segments, several studies examined whether this could lead to the emergence of subspecies. Most of them simulated fixed-size Wright-Fisher populations, in which the genetic drift should be taken into account. Here, we use nonlinear Markov processes to describe a bacterial population evolving under mutation and recombination. We consider a population structure as a probability measure on the space of genomes. This approach implies the infinite population size limit, and thus, the genetic drift is not assumed. We prove that under these conditions, the emergence of subspecies is impossible.
The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V−SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online (http://bioinf.spbau.ru/spades). It is distributed as open source software.