Automatic genetic phenotype normalization from dysmorphology physical examinations: an overview of the BioCreative VIII-Task 3 competition

Database (Oxford) 2025-11-25

Database (Oxford). 2025 Jan 18;2025:baaf051. doi: 10.1093/database/baaf051.

ABSTRACT

We present here an overview of the BioCreative VIII Task 3 competition, which called for the development of state-of-the-art approaches to automatic normalization of observations noted by physicians in dysmorphology physical examinations to the Human Phenotype Ontology (HPO). We made available for the task 3136 deidentified and manually annotated observations extracted from electronic health records of 1652 paediatric patients at the Children's Hospital of Philadelphia. This task is challenging due to the discontinuous, overlapping, and descriptive mentions of the observations corresponding to HPO terms, severely limiting the performance of straightforward strict matching approaches. Ultimately, an effective automated solution to the task will facilitate computational analysis that could uncover novel correlations and patterns of observations in patients with rare genetic diseases, enhance our understanding of known genetic conditions, and even identify previously unrecognized conditions. A total of 20 teams registered, and 5 teams submitted their predictions. We summarize the corpus, the competing systems approaches, and their results. The top system used a pre-trained large language model and achieved a 0.82 F1 score, which is close to human performance, confirming the impact that recent advances in natural language processing can have on tasks such as this. The post-evaluation period of the challenge, at https://codalab.lisn.upsaclay.fr/competitions/11351, will be open for submissions for at least 18 months past the end of the competition. Database URL: https://codalab.lisn.upsaclay.fr/competitions/11351.

PMID:40996704 | PMC:PMC12462374 | DOI:10.1093/database/baaf051