ImmuneData: an integrated data discovery system for immunology data repositories

Database (Oxford) 2022-09-26

Database (Oxford). 2022 Mar 9;2022:baac003. doi: 10.1093/database/baac003.

ABSTRACT

To meet the increasing demand for data sharing, data reuse and meta-analysis in the immunology research community, we have developed the data discovery system ImmuneData. The system provides integrated access to five immunology data repositories funded by the National Institute of Allergy and Infectious Diseases, Division of Allergy, Immunology and Transplantation, including ImmPort, ImmuneSpace, ITN TrialShare, ImmGen and IEDB. ImmuneData restructures the data repositories' metadata into a uniform schema using domain experts' knowledge and state-of-the-art Natural Language Processing (NLP) technologies. It comes with a user-friendly web interface, accessible at http://www.immunedata.org/, and a Google-like search engine for biological researchers to find and access data easily. The vast quantity of synonyms used in biomedical research increase the likelihood of incomplete search results. Thus, our search engine converts queries submitted by users into ontology terms, which are then expended by NLP technologies to ensure that the search results will include all synonyms for a particular concept. The system also includes an advanced search function to build customized queries to meet higher-level users' needs. ImmuneData ensures the FAIR principle (Findability, Accessibility, Interoperability and Reusability) of the five data repositories to benefit data reuse in the immunology research community. The data pipeline constructing our system can be extended to other data repositories to build a more comprehensive biological data discovery system.

DATABASE URL: http://www.immunedata.org/.

PMID:35262674 | PMC:PMC9216516 | DOI:10.1093/database/baac003