TY - JOUR
T1 - Primer, pipelines, parameters
T2 - Issues in 16s rrna gene sequencing
AU - Abellan-Schneyder, Isabel
AU - Matchado, Monica S.
AU - Reitmeier, Sandra
AU - Sommer, Alina
AU - Sewald, Zeno
AU - Baumbach, Jan
AU - List, Markus
AU - Neuhaus, Klaus
N1 - Publisher Copyright:
© 2021 American Society for Microbiology. All rights reserved.
PY - 2021/2
Y1 - 2021/2
N2 - Short-amplicon 16S rRNA gene sequencing is currently the method of choice for studies investigating microbiomes. However, comparative studies on differences in procedures are scarce. We sequenced human stool samples and mock communities with increasing complexity using a variety of commonly used protocols. Short amplicons targeting different variable regions (V-regions) or ranges thereof (V1-V2, V1- V3, V3-V4, V4, V4-V5, V6-V8, and V7-V9) were investigated for differences in the composition outcome due to primer choices. Next, the influence of clustering (operational taxonomic units [OTUs], zero-radius OTUs [zOTUs], and amplicon sequence variants [ASVs]), different databases (GreenGenes, the Ribosomal Database Project, Silva, the genomicbased 16S rRNA Database, and The All-Species Living Tree), and bioinformatic settings on taxonomic assignment were also investigated. We present a systematic comparison across all typically used V-regions using well-established primers. While it is known that the primer choice has a significant influence on the resulting microbial composition, we show that microbial profiles generated using different primer pairs need independent validation of performance. Further, comparing data sets across V-regions using different databases might be misleading due to differences in nomenclature (e.g., Enterorhabdus versus Adlercreutzia) and varying precisions in classification down to genus level. Overall, specific but important taxa are not picked up by certain primer pairs (e.g., Bacteroidetes is missed using primers 515F-944R) or due to the database used (e.g., Acetatifactor in GreenGenes and the genomic-based 16S rRNA Database). We found that appropriate truncation of amplicons is essential and different truncated-length combinations should be tested for each study. Finally, specific mock communities of sufficient and adequate complexity are highly recommended.
AB - Short-amplicon 16S rRNA gene sequencing is currently the method of choice for studies investigating microbiomes. However, comparative studies on differences in procedures are scarce. We sequenced human stool samples and mock communities with increasing complexity using a variety of commonly used protocols. Short amplicons targeting different variable regions (V-regions) or ranges thereof (V1-V2, V1- V3, V3-V4, V4, V4-V5, V6-V8, and V7-V9) were investigated for differences in the composition outcome due to primer choices. Next, the influence of clustering (operational taxonomic units [OTUs], zero-radius OTUs [zOTUs], and amplicon sequence variants [ASVs]), different databases (GreenGenes, the Ribosomal Database Project, Silva, the genomicbased 16S rRNA Database, and The All-Species Living Tree), and bioinformatic settings on taxonomic assignment were also investigated. We present a systematic comparison across all typically used V-regions using well-established primers. While it is known that the primer choice has a significant influence on the resulting microbial composition, we show that microbial profiles generated using different primer pairs need independent validation of performance. Further, comparing data sets across V-regions using different databases might be misleading due to differences in nomenclature (e.g., Enterorhabdus versus Adlercreutzia) and varying precisions in classification down to genus level. Overall, specific but important taxa are not picked up by certain primer pairs (e.g., Bacteroidetes is missed using primers 515F-944R) or due to the database used (e.g., Acetatifactor in GreenGenes and the genomic-based 16S rRNA Database). We found that appropriate truncation of amplicons is essential and different truncated-length combinations should be tested for each study. Finally, specific mock communities of sufficient and adequate complexity are highly recommended.
KW - 16S rRNA gene sequencing
KW - Amplicon sequencing
KW - Bioinformatic settings
KW - Clustering
KW - Databases
KW - Microbiome
KW - Mock communities
KW - Variable regions
UR - http://www.scopus.com/inward/record.url?scp=85102096790&partnerID=8YFLogxK
U2 - 10.1128/mSphere.01202-20
DO - 10.1128/mSphere.01202-20
M3 - Article
C2 - 33627512
AN - SCOPUS:85102096790
SN - 2379-5042
VL - 6
JO - mSphere
JF - mSphere
IS - 1
M1 - e01202-20
ER -