Increasing metadata coverage of SRA BioSample entries using deep learning-based named entity recognition
Database (Oxford) 2022-01-31
Summary:
High-quality metadata annotations for data hosted in large public repositories are essential for research reproducibility and for conducting fast, powerful and scalable meta-analyses. Currently, a majority of sequencing samples in the National Center for Biotechnology Information's Sequence Read Archive (SRA) are missing metadata across several categories. In an effort to improve the metadata coverage of these samples, we leveraged almost 44 million attribute-value pairs from SRA BioSample to...