Ultrax Speech Sound Disorders
A dataset of ultrasound and audio recordings from children with speech sound disorders
The UXSSD dataset contains 8 speakers (2 female and 6 male), aged 5-10 years.
The table below give further details for each speaker. Ages were taken in the first Assessment session and are indicated in years (AGE-Y) and months (AGE-M).
|BL||Baseline session before therapy (1-2 sessions)|
|Mid||Mid-point session, halfway through therapy|
|Post||Post-therapy session, immediately after therapy ended|
|Maint||Maintenance session, some time after therapy ended|
Core data types
|ult||raw ultrasound data|
|txt||prompt text with date/time of utterance recording|
|slt_labels||manual annotation from SLT, when available. See  for details|
|speaker_labels||speaker diarization identifying therapist (SLT) and child (CHILD) speech|
|word_labels||automatic word-level alignment|
|phone_labels||automatic phone-level alignment|
|reference_labels||manually-revised labels (see below for details)|
Labels are available in Praat's TextGrid format and HTK's lab format.
Speaker, word, and phone labels were generated according to the methods described in .
Individual recordings are indexed for each session according to their recording times. See the prompt text file for recording date/time.
Each file ID also includes a prompt type identifier. See Data for details.
Reference labels are given for a few utterances of the UXTD and UXSSD datasets. These have been manually revised at the speaker (60 utterances) and word level (199 utterances). The revision was done by a single annotator.
Note that phone labels are also provided, but these are not entirely manually-revised. This set of annotation is force-aligned at the phone level, but constrained by the manually-revised word boundaries.
Labels are available in Praat's TextGrid format (
TG) and HTK's lab format (
lab). The structure for the directory is as follows:
/uxssd /phone_labels /lab /TG /word_labels /lab /TG /speaker_labels /lab /TG
Speaker 05M was subjected to two rounds of therapy, with corresponding Assessment sessions. These are identified as *_round2 in the speaker directory. Therapy sessions for this speaker are indexed chronologically.
 Eshky, A., Ribeiro, M. S., Cleland, J., Richmond, K., Roxburgh, Z., Scobbie, J., & Wrench, A. (2018) Ultrasuite: A repository of ultrasound and acoustic data from child speech therapy sessions. Proceedings of INTERSPEECH. Hyderabad, India.
 Cleland, J., Scobbie, J. M., & Wrench, A. A. (2015). Using ultrasound visual biofeedback to treat persistent primary speech sound disorders. Clinical linguistics & phonetics, 29(8-10), 575-597.
 Ribeiro, M. S., Eshky, A., Richmond, K., Renals, S., (2019). Ultrasound tongue imaging for diarization and alignment of child speech therapy sessions. Proceedings of INTERSPEECH. Graz, Austria.