LMAS: Last (meta)genomic assembler standing

  • Catarina Inês Mendes (Creator)
  • Vila-Cerqueira (Creator)
  • Yair Motro (Creator)
  • Jacob Moran-Gilad (Creator)
  • João André Carriço (Creator)
  • Mario Ramirez (Creator)

Dataset

Description

The de novo assembly of raw sequence data is a key process when analysing data from shotgun metagenomic sequencing. It allows recovering draft genomes from a pool of mixed raw reads, yielding longer sequences that offer contextual information and afford a more complete picture of the microbial community. It also represents one of the greatest bottlenecks when obtaining trustworthy, reproducible results.

 

LMAS is an automated workflow enabling the benchmarking of traditional and metagenomic prokaryotic de novo assembly software using defined mock communities. In its current form, 11 assemblers are implemented in LMAS, with several steps having been implemented to ensure the transparency and reproducibility of the results. The use of Docker containers for each assembler allows versions to be tracked, and the use of Nextflow, a workflow management software, allows the effortless deployment of LMAS in any UNIX-based system, from local machines to high-performance computing clusters with a container engine installation. The results are presented in an interactive HTML report where selected global and reference specific performance metrics can be explored. The mock communities can be provided by the user to better reflect the samples of interest. New assemblers can be added with minimal changes to the pipeline, so that LMAS can be expanded as novel algorithms are developed.

 

As proof-of-concept, the eight bacterial genomes and four plasmids of the ZymoBIOMICS Microbial Community Standards were used as reference, and raw sequence data of the mock communities, with an even and logarithmic distribution of species, and a simulated sample of the evenly distributed reads generated from the reference genomes were used as input for LMAS. The resulting report is available at https://lmas-demo.herokuapp.com.  

 

Our results show that the choice of a de novo assembler depends greatly on the computational resources available and the species of interest, with the performance of each assembly varying greatly with its abundance in the sample. Overall, multiple k-mer De Bruijn graph assemblers outperform the alternatives but come with a greater cost in computational resources. No significant performance gains were obtained when using dedicated metagenomic assemblers and no single assembler emerged as an undisputed ideal choice for short-read metagenomic prokaryote genome assembly, with the different assemblers showing specific strengths. This highlights that further improvement in metagenomic assembly performance is needed, and using LMAS could underpin this. 

 

The LMAS workflow and documentation is available at https://github.com/cimendes/LMAS. 
Date made available9 Feb 2022
PublisherZENODO

Cite this