TY - JOUR
T1 - CATCh, an Ensemble Classifier for Chimera Detection in 16S rRNA Sequencing Studies
AU - Ahmed, Mohamed
AU - Saeys, Yvan
AU - Leys, Natalie
AU - Raes, Jeroen
AU - Monsieurs, Pieter
N1 - Score = 10
PY - 2015/3/1
Y1 - 2015/3/1
N2 - In ecological studies, microbial diversity is nowadays mostly assessed via the detection of phylogenetic marker genes, such as 16S rRNA. However, PCR amplification of these marker genes produces a significant amount of artificial sequences, often referred to as chimeras. Different algorithms have been developed to remove these chimeras, but efforts to combine different methodologies are limited. Therefore, two machine learning classifiers (reference-based andde novoCATCh) were developed by integrating the output of existing chimera detection tools into a new, more powerful method. When comparing our classifiers with existing tools in either the reference-based orde novomode, a higher performance of our ensemble method was observed on a wide range of sequencing data, including simulated, 454 pyrosequencing, and Illumina MiSeq data sets. Since our algorithm combines the advantages of different individual chimera detection tools, our approach produces more robust results when challenged with chimeric sequences having a low parent divergence, short length of the chimeric range, and various numbers of parents. Additionally, it could be shown that integrating CATCh in the preprocessing pipeline has a beneficial effect on the quality of the clustering in operational taxonomic units.
AB - In ecological studies, microbial diversity is nowadays mostly assessed via the detection of phylogenetic marker genes, such as 16S rRNA. However, PCR amplification of these marker genes produces a significant amount of artificial sequences, often referred to as chimeras. Different algorithms have been developed to remove these chimeras, but efforts to combine different methodologies are limited. Therefore, two machine learning classifiers (reference-based andde novoCATCh) were developed by integrating the output of existing chimera detection tools into a new, more powerful method. When comparing our classifiers with existing tools in either the reference-based orde novomode, a higher performance of our ensemble method was observed on a wide range of sequencing data, including simulated, 454 pyrosequencing, and Illumina MiSeq data sets. Since our algorithm combines the advantages of different individual chimera detection tools, our approach produces more robust results when challenged with chimeric sequences having a low parent divergence, short length of the chimeric range, and various numbers of parents. Additionally, it could be shown that integrating CATCh in the preprocessing pipeline has a beneficial effect on the quality of the clustering in operational taxonomic units.
KW - 16S metagenomics
KW - chimera
KW - mothur
UR - http://ecm.sckcen.be/OTCS/llisapi.dll/open/ezp_138619
UR - http://knowledgecentre.sckcen.be/so2/bibref/12303
U2 - 10.1128/AEM.02896-14
DO - 10.1128/AEM.02896-14
M3 - Article
SN - 0099-2240
VL - 81
SP - 1573
EP - 1584
JO - Applied and Environmental Microbiology
JF - Applied and Environmental Microbiology
IS - 5
ER -