Coevolution of genes and languages and high levels of population structure among the highland populations of Daghestan
As a result of the combination of great linguistic and cultural diversity, the highland populations of Daghestan present an excellent opportunity to test the hypothesis of language-gene coevolution at a fine geographic scale. However, previous genetic studies generally have been restricted to uniparental markers and have not included many of the key populations of the region. To improve our understanding of the genetic structure of Daghestani populations and to investigate possible correlations between genetic and linguistic variation, we analyzed ~550,000 autosomal single nucleotide polymorphisms, phylogenetically informative Y chromosome markers and mtDNA haplotypes in 21 ethnic Daghestani groups. We found high levels of population structure in Daghestan consistent with the hypothesis of long-term isolation among populations of the highland Caucasus. Highland Daghestani populations exhibit extremely high levels of between-population diversity for all genetic systems tested, leading to some of the highest FST values observed for any region of the world. In addition, we find a significant positive correlation between gene and language diversity, suggesting that these two aspects of human diversity have coevolved as a result of historical patterns of social interaction among highland farmers at the community level. Finally, our data are consistent with the hypothesis that most Daghestanian-speaking groups descend from a common ancestral population (~6000-6500 years ago) that spread to the Caucasus by demic diffusion followed by population fragmentation and low levels of gene flow.