• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
vision user

Working paper

FAST AND SCALABLE GENOME-WIDE INFERENCE OF LOCAL TREE TOPOLOGIES FROM LARGE NUMBER OF HAPLOTYPES BASED ON TREE CONSISTENT PBWT DATA STRUCTURE

Щур В. Л., Ziganurova L., Durbin R.
Estimation of the relationship between DNA sequences is one of the most important problems in genomics. Understanding these relationships is central to de- mographic inference, correction of population structure in GWAS, identifying signals of selection etc. The data structure containing the full information about sample genealogy is called the ancestral recombination graph (ARG). However, ARG inference is a very dicult problem, not least due to a very complex state space. In this work we describe a new approach for fast and scalable generation of local tree topologies relating large numbers of haplotypes. Our method is closely related to the estimation of ARG, and captures both local and global properties of an ARG. It is based on a data structure which we call tree consistent PBWT , a modi cation of PBWT data structure intro- duced by R. Durbin (2014). We also explore some methods to estimate the quality of the generated tree topologies and to make inferences based on them. At the end we discuss a probabilistic model which could potentially lead to the estimation of ARG node times.