An Audio-Ultrasound Synchronized Database of Tongue Movement for Mandarin speech

database[Title] 2025-04-20

Sci Data. 2025 Apr 11;12(1):607. doi: 10.1038/s41597-025-04917-w.

ABSTRACT

Ultrasound imaging has been widely adopted in speech research to visualize dynamic tongue movements during speech production. These images are universally used as visual feedback in interventions for articulation disorders or visual cues in speech recognition. Nevertheless, the availability of high-quality audio-ultrasound datasets remains scarce. The present study, therefore, aims to construct a multimodal database designed for Mandarin speech. The dataset integrates synchronized ultrasound images of lingual movement, and the corresponding audio recordings and text annotations elicited from 43 healthy speakers and 11 patients with dysarthria through speech tasks (including vowels, monosyllables, and sentences), with a total duration of 22.31 hours. In addition, a customized helmet structure was employed to stabilize the ultrasound probe, precisely controlling for head movement and minimizing displacement interference, The proposed database carries apparent values in automatic speech recognition, silent interface development, and research in speech pathology and linguistics.

PMID:40216747 | PMC:PMC11992241 | DOI:10.1038/s41597-025-04917-w