TY - JOUR
T1 - BacPipe: A rapid, user-friendly whole genome sequencing pipeline for clinical diagnostic bacteriology
AU - Xavier, Basil B.
AU - Ahmed, Mohamed
AU - Bolzan, Mattia
AU - Ribeiro-Gonçalves, Bruno
AU - Alako, Blaise T.F.
AU - Harrison, Peter
AU - Lammens, Christine
AU - Kumar-Singh, Samir
AU - Goossens, Herman
AU - Carriço, João André
AU - Cochrane, Guy R.
AU - Malhotra-kumar, Surbhi
N1 - Score=10
PY - 2019/12/9
Y1 - 2019/12/9
N2 - Objectives: Despite rapid advances in whole genome sequencing (WGS) technologies, their integration into routine microbiological diagnostics has been hampered by the need for standardised downstream bioinformatics analysis. Here we developed a comprehensive and computationally low-resource bioinformatics pipeline (BacPipe) enabling direct analyses of bacterial whole-genome sequences (raw reads or contigs) obtained from second or third-generation sequencing technologies.
Methods: Open-access tools for quality verification, de novo assembly (SPAdes), annotation (Prokka), bacterial typing (MLST, emm typing), and for identification of resistance genes (Resfams), plasmids, virulence genes, single nucleotide polymorphisms (SNPs) and core genome phylogeny were integrated into a single Python script. A graphical user interface (GUI) was developed to allow real-time progression of the analysis. The scalability and speed of BacPipe in handling large data-sets was further demonstrated using 4139 Illumina paired-end sequence files of publicly-available bacterial genomes (2.9−5.4 Mb) from the European Nucleotide Archive (ENA).
Results: Computational time on Bacpipe, demonstrated on a 8 Gb RAM personal computer, was 21, 25, 28 and 30 minutes for sequencing coverage of 50-, 70-, 100- and 120-folds of a 5.1 Mb bacterial genome, respectively. Compiled results of every individual genome/strain are saved as an Excel file. Up to 56% reduction in analysis time was achieved by a unique parallelization of post-assembly and post-annotation tools in Bacpipe compared to running these tools in succession. On the 4139 Illumina paired-end sequence files, running time was on average 50 minutes/strain. Bacpipe is integrated in EBI-SELECTA, a project-specific portal (H2020 COMPARE), and is also available as an independent docker image that can be used across Windows- and Unix-based systems.
Conclusion: BacPipe offers a fully automated ‘one-stop’ bacterial WGS analysis pipeline with a user-friendly GUI which can contribute to overcome the major hurdle of WGS data analysis in hospitals and public-health and for infection-control monitoring.
AB - Objectives: Despite rapid advances in whole genome sequencing (WGS) technologies, their integration into routine microbiological diagnostics has been hampered by the need for standardised downstream bioinformatics analysis. Here we developed a comprehensive and computationally low-resource bioinformatics pipeline (BacPipe) enabling direct analyses of bacterial whole-genome sequences (raw reads or contigs) obtained from second or third-generation sequencing technologies.
Methods: Open-access tools for quality verification, de novo assembly (SPAdes), annotation (Prokka), bacterial typing (MLST, emm typing), and for identification of resistance genes (Resfams), plasmids, virulence genes, single nucleotide polymorphisms (SNPs) and core genome phylogeny were integrated into a single Python script. A graphical user interface (GUI) was developed to allow real-time progression of the analysis. The scalability and speed of BacPipe in handling large data-sets was further demonstrated using 4139 Illumina paired-end sequence files of publicly-available bacterial genomes (2.9−5.4 Mb) from the European Nucleotide Archive (ENA).
Results: Computational time on Bacpipe, demonstrated on a 8 Gb RAM personal computer, was 21, 25, 28 and 30 minutes for sequencing coverage of 50-, 70-, 100- and 120-folds of a 5.1 Mb bacterial genome, respectively. Compiled results of every individual genome/strain are saved as an Excel file. Up to 56% reduction in analysis time was achieved by a unique parallelization of post-assembly and post-annotation tools in Bacpipe compared to running these tools in succession. On the 4139 Illumina paired-end sequence files, running time was on average 50 minutes/strain. Bacpipe is integrated in EBI-SELECTA, a project-specific portal (H2020 COMPARE), and is also available as an independent docker image that can be used across Windows- and Unix-based systems.
Conclusion: BacPipe offers a fully automated ‘one-stop’ bacterial WGS analysis pipeline with a user-friendly GUI which can contribute to overcome the major hurdle of WGS data analysis in hospitals and public-health and for infection-control monitoring.
KW - Biological Sciences Research Methodologies
KW - Microbiology
KW - Sequence Analysis
KW - BacPipe
UR - http://ecm.sckcen.be/OTCS/llisapi.dll/open/36629876
U2 - 10.1016/j.isci.2019.100769
DO - 10.1016/j.isci.2019.100769
M3 - Article
SN - 2589-0042
VL - 23
SP - 1
EP - 28
JO - iScience
JF - iScience
IS - 100769
M1 - 100767
ER -