PRIME: a database for 16S rRNA microbiome data with phenotypic reference and comprehensive metadata

(database[TitleAbstract]) AND (Nucleic acids research[Journal]) 2025-12-10

Nucleic Acids Res. 2025 Oct 31:gkaf1057. doi: 10.1093/nar/gkaf1057. Online ahead of print.

ABSTRACT

PRIME (Phenotypic Reference for Integrated Microbiome Enrichment) is a curated and standardized database of human microbiome 16S rRNA amplicon sequencing data, designed to facilitate cross-study analysis, reproducibility, and phenotype-driven discovery. PRIME aggregates 53 449 samples from 111 public studies, covering 93 body sites and 101 phenotypic categories, with detailed harmonization of sample-level metadata such as disease status, demographics, body sites, sequencing protocols, and experimental design. Each sample includes taxonomic abundance profiles generated via a consistent pipeline using both SILVA (138.2) and Greengenes2 (2024.09) reference databases, with results reported at multiple taxonomic levels as observed abundances (read counts) and relative abundances (proportions). A major strength of PRIME is its extensive manual curation, which standardizes phenotypic and contextual metadata across studies, enabling precise querying and robust phenotype-based comparisons. Users can interactively explore the database through a modern web interface, filter and visualize data by metadata fields, and download customized subsets. Programmatic access is supported via RESTful APIs and R package. PRIME aims to advance microbiome data integration and is continuously updated to incorporate new studies and features. The database is freely available at https://primedb.sjtu.edu.cn.

PMID:41171140 | DOI:10.1093/nar/gkaf1057