How clear is our current view on microbial dark matter? (Re-)assessing public MAG & SAG datasets with MDMcleaner

(database[TitleAbstract]) AND (Nucleic acids research[Journal]) 2022-09-26

Nucleic Acids Res. 2022 Jul 22;50(13):e76. doi: 10.1093/nar/gkac294.

ABSTRACT

As of today, the majority of environmental microorganisms remain uncultured and is therefore referred to as 'microbial dark matter' (MDM). Hence, genomic insights into these organisms are limited to cultivation-independent approaches such as single-cell- and metagenomics. However, without access to cultured representatives for verifying correct taxon-assignments, MDM genomes may cause potentially misleading conclusions based on misclassified or contaminant contigs, thereby obfuscating our view on the uncultured microbial majority. Moreover, gradual database contaminations by past genome submissions can cause error propagations which affect present as well as future comparative genome analyses. Consequently, strict contamination detection and filtering need to be applied, especially in the case of uncultured MDM genomes. Current genome reporting standards, however, emphasize completeness over purity and the de facto gold standard genome assessment tool, checkM, discriminates against uncultured taxa and fragmented genomes. To tackle these issues, we present a novel contig classification, screening, and filtering workflow and corresponding open-source python implementation called MDMcleaner, which was tested and compared to other tools on mock and real datasets. MDMcleaner revealed substantial contaminations overlooked by current screening approaches and sensitively detects misattributed contigs in both novel genomes and the underlying reference databases, thereby greatly improving our view on 'microbial dark matter'.

PMID:35536293 | PMC:PMC9303271 | DOI:10.1093/nar/gkac294