New Publication: Quantification of the Hunter-Gatherer Oral Microbiome through Analysis of Exome Sequences

Teaching a new course (Microbial Genomics and Bioinformatics) has kept me pretty busy this quarter, so I’m a bit behind on announcing lab developments. With the bulk of teaching behind me, I’m now going to play a little catch-up with the next few posts.

First up, we published a paper in BMC Genomics entitled “Exome capture from saliva produces high quality genomic and metagenomic data” in collaboration with several other research groups, most notably the labs of Brenna Henn (Stony Brook) and Jeff Kidd (U Michigan).

We collected saliva samples from several Khoesan individuals, including several that live traditional hunter-gatherer lifestyles, and subjected DNA extracted from these samples to human exome capture and sequencing. This targeted sequencing approach facilitated population genetic analysis of human coding sequences. But, a fraction of the sequences didn’t show any discernible similarity to the human genome. We compared this fraction of sequences to genomes of known oral microbiota and determined that we can use this data to quantify the diversity of the oral microbiome . We then used this data to characterize, for the first time, the Khoesan oral microbiome and how it differs from the healthy American oral microbiome.

We are excited about this work because it (1) clarifies how microbiome composition scales across diverse human populations and (2) indicates that exome-capture sequencing of saliva DNA, which is increasingly used to study human populations, provides the added feature of resolving oral microbiome structure and diversity. Our hope is that we will be able to use this platform to study how microbiome composition covaries with human genomic variation.

The abstract for the manuscript follows.

Background

Targeted capture of genomic regions reduces sequencing cost while generating higher coverage by allowing biomedical researchers to focus on specific loci of interest, such as exons. Targeted capture also has the potential to facilitate the generation of genomic data from DNA collected via saliva or buccal cells. DNA samples derived from these cell types tend to have a lower human DNA yield, may be degraded from age and/or have contamination from bacteria or other ambient oral microbiota. However, thousands of samples have been previously collected from these cell types, and saliva collection has the advantage that it is a non-invasive and appropriate for a wide variety of research.

Results

We demonstrate successful enrichment and sequencing of 15 South African KhoeSan exomes and 2 full genomes with samples initially derived from saliva. The expanded exome dataset enables us to characterize genetic diversity free from ascertainment bias for multiple KhoeSan populations, including new exome data from six HGDP Namibian San, revealing substantial population structure across the Kalahari Desert region. Additionally, we discover and independently verify thirty-one previously unknown KIR alleles using methods we developed to accurately map and call the highly polymorphic HLA and KIR loci from exome capture data. Finally, we show that exome capture of saliva-derived DNA yields sufficient non-human sequences to characterize oral microbial communities, including detection of bacteria linked to oral disease (e.g. Prevotella melaninogenica). For comparison, two samples were sequenced using standard full genome library preparation without exome capture and we found no systematic bias of metagenomic information between exome-captured and non-captured data.

Conclusions

DNA from human saliva samples, collected and extracted using standard procedures, can be used to successfully sequence high quality human exomes, and metagenomic data can be derived from non-human reads. We find that individuals from the Kalahari carry a higher oral pathogenic microbial load than samples surveyed in the Human Microbiome Project. Additionally, rare variants present in the exomes suggest strong population structure across different KhoeSan populations.