TY - JOUR
T1 - The unresolved struggle of 16S rRNA amplicon sequencing
T2 - a benchmarking analysis of clustering and denoising methods
AU - Fares, Mohamed
AU - Tharwat, Engy K.
AU - Cleenwerck, Ilse
AU - Monsieurs, Pieter
AU - Van Houdt, Rob
AU - Vandamme, Peter
AU - El-Hadidi, Mohamed
AU - Mysara, Mohamed
N1 - Score=10
Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - Background: Although 16S rRNA gene amplicon sequencing has become an indispensable method for microbiome studies, this analysis is not error-free, and remains prone to several biases and errors. Numerous algorithms have been developed to eliminate these errors and consolidate the output into distance-based Operational Taxonomic Units (OTUs) or denoising-based Amplicon Sequence Variants (ASVs). An objective comparison between them has been obscured by various experimental setups and parameters. In the present study, we conducted a comprehensive benchmarking analysis of the error rates, microbial composition, over-merging/over-splitting of reference sequences, and diversity analyses using the most complex mock community, comprising 227 bacterial strains and the Mockrobiota database. Using unified preprocessing steps, we were able to compare DADA2, Deblur, MED, UNOISE3, UPARSE, DGC (Distance-based Greedy Clustering), AN (Average Neighborhood), and Opticlust objectively. Results: ASV algorithms—led by DADA2— resulted in having a consistent output, yet suffered from over-splitting, while OTU algorithms—led by UPARSE—achieved clusters with lower errors, yet with more over-merging. Notably, UPARSE and DADA2 showed the closest resemblance to the intended microbial community, especially when considering measures for alpha and beta diversity. Conclusion: Our unbiased comparative evaluation examined the performance of eight algorithms dedicated to the analysis of 16S rRNA amplicon sequences with a wide range of mock datasets. Our analysis shed light on the pros and cons of each algorithm and the accuracy of the produced OTUs or ASVs. The utilization of the most complex mock community and the benchmarking comparison presented here offer a framework for the comparison between OTU/ASV algorithms and an objective method for the assessment of new tools and algorithms.
AB - Background: Although 16S rRNA gene amplicon sequencing has become an indispensable method for microbiome studies, this analysis is not error-free, and remains prone to several biases and errors. Numerous algorithms have been developed to eliminate these errors and consolidate the output into distance-based Operational Taxonomic Units (OTUs) or denoising-based Amplicon Sequence Variants (ASVs). An objective comparison between them has been obscured by various experimental setups and parameters. In the present study, we conducted a comprehensive benchmarking analysis of the error rates, microbial composition, over-merging/over-splitting of reference sequences, and diversity analyses using the most complex mock community, comprising 227 bacterial strains and the Mockrobiota database. Using unified preprocessing steps, we were able to compare DADA2, Deblur, MED, UNOISE3, UPARSE, DGC (Distance-based Greedy Clustering), AN (Average Neighborhood), and Opticlust objectively. Results: ASV algorithms—led by DADA2— resulted in having a consistent output, yet suffered from over-splitting, while OTU algorithms—led by UPARSE—achieved clusters with lower errors, yet with more over-merging. Notably, UPARSE and DADA2 showed the closest resemblance to the intended microbial community, especially when considering measures for alpha and beta diversity. Conclusion: Our unbiased comparative evaluation examined the performance of eight algorithms dedicated to the analysis of 16S rRNA amplicon sequences with a wide range of mock datasets. Our analysis shed light on the pros and cons of each algorithm and the accuracy of the produced OTUs or ASVs. The utilization of the most complex mock community and the benchmarking comparison presented here offer a framework for the comparison between OTU/ASV algorithms and an objective method for the assessment of new tools and algorithms.
KW - 16S rRNA amplicon sequencing
KW - Amplicon sequence variants (ASVs)
KW - Denoising
KW - Operational taxonomical units (OTUs)
UR - http://www.scopus.com/inward/record.url?scp=105005075671&partnerID=8YFLogxK
U2 - 10.1186/s40793-025-00705-6
DO - 10.1186/s40793-025-00705-6
M3 - Article
AN - SCOPUS:105005075671
SN - 2524-6372
VL - 20
JO - Environmental Microbiome
JF - Environmental Microbiome
IS - 1
M1 - 51
ER -