Short-read metagenomic assembly: finding the best fit in a world of options

  • Catarina Inês Mendes (Creator)
  • Pedro Vila-Cerqueira (Creator)
  • Yair Motro (Creator)
  • Jacob Moran-Gilad (Creator)
  • João André Carriço (Creator)
  • Mario Ramirez (Creator)

Dataset

Description

Background Short-read shotgun metagenomics can offer comprehensive microbial detection and characterisation of complex clinical samples. The de novo assembly is a key process when analysing metagenomic data since it allows recovering draft genomes from a pool of mixed raw reads, yielding longer sequences that offer contextual information and afford a more complete picture of the microbial community than simply species composition. It also represents a major bottleneck in obtaining trustworthy, reproducible results.

 

Materials/Methods We developed LMAS, an automated workflow enabling the benchmarking of traditional and metagenomic dedicated prokaryotic de novo assembly software using defined mock communities. LMAS was implemented in Nextflow using Docker containers to provide flexibility. The results are presented in an interactive HTML report where selected global and reference specific performance metrics can be explored. The mock communities can be provided by the user to better reflect the samples of interest. New assemblers can be added with minimal changes to the pipeline, so that LMAS can be expanded as novel algorithms are developed.

 

Results The eight bacterial genomes and four plasmids of the ZymoBIOMICS Microbial Community Standards were used as reference. Raw sequence data of the mock communities, with an even and logarithmic distribution of species, and a simulated sample of the evenly distributed reads generated from the genomes in the Zymobiomics standard were used as input for 11 de novo assemblers (Figure 1). The resulting LMAS report is available at https://lmas-demo.herokuapp.com.  

 

Conclusions Our results showed significant differences in breadth of coverage, and number and accuracy of the contigs generated by each de novo assembler. The performance of each assembler varied depending on the species of interest and its abundance in the sample, with less abundant species presenting a significant challenge for all assemblers. No sizable gains were obtained when using dedicated metagenomic assemblers and no assembler stood out as an undisputed all-purpose choice for short-read metagenomic prokaryote genome assembly, with different assemblers showing specific strengths. Efforts are needed to further improve metagenomic assembly performance and using LMAS could underpin this development process. The LMAS workflow and documentation is available at https://github.com/cimendes/LMAS. 
Date made available2022
PublisherZENODO

Cite this