BacPipe: A rapid, user-friendly whole genome sequencing pipeline for clinical diagnostic bacteriology

Basil B. Xavier, Mohamed Ahmed, Mattia Bolzan, Bruno Ribeiro-Gonçalves, Blaise T.F. Alako, Peter Harrison, Christine Lammens, Samir Kumar-Singh, Herman Goossens, João André Carriço, Guy R. Cochrane, Surbhi Malhotra-kumar

    Research outputpeer-review

    Abstract

    Objectives: Despite rapid advances in whole genome sequencing (WGS) technologies, their integration into routine microbiological diagnostics has been hampered by the need for standardised downstream bioinformatics analysis. Here we developed a comprehensive and computationally low-resource bioinformatics pipeline (BacPipe) enabling direct analyses of bacterial whole-genome sequences (raw reads or contigs) obtained from second or third-generation sequencing technologies. Methods: Open-access tools for quality verification, de novo assembly (SPAdes), annotation (Prokka), bacterial typing (MLST, emm typing), and for identification of resistance genes (Resfams), plasmids, virulence genes, single nucleotide polymorphisms (SNPs) and core genome phylogeny were integrated into a single Python script. A graphical user interface (GUI) was developed to allow real-time progression of the analysis. The scalability and speed of BacPipe in handling large data-sets was further demonstrated using 4139 Illumina paired-end sequence files of publicly-available bacterial genomes (2.9−5.4 Mb) from the European Nucleotide Archive (ENA). Results: Computational time on Bacpipe, demonstrated on a 8 Gb RAM personal computer, was 21, 25, 28 and 30 minutes for sequencing coverage of 50-, 70-, 100- and 120-folds of a 5.1 Mb bacterial genome, respectively. Compiled results of every individual genome/strain are saved as an Excel file. Up to 56% reduction in analysis time was achieved by a unique parallelization of post-assembly and post-annotation tools in Bacpipe compared to running these tools in succession. On the 4139 Illumina paired-end sequence files, running time was on average 50 minutes/strain. Bacpipe is integrated in EBI-SELECTA, a project-specific portal (H2020 COMPARE), and is also available as an independent docker image that can be used across Windows- and Unix-based systems. Conclusion: BacPipe offers a fully automated ‘one-stop’ bacterial WGS analysis pipeline with a user-friendly GUI which can contribute to overcome the major hurdle of WGS data analysis in hospitals and public-health and for infection-control monitoring.
    Original languageEnglish
    Article number100767
    Pages (from-to)1-28
    Number of pages28
    JournaliScience
    Volume23
    Issue number100769
    DOIs
    StatePublished - 9 Dec 2019

    Cite this