Ultrax Speech Sound Disorders

A dataset of ultrasound and audio recordings from children with speech sound disorders

Speakers

The UXSSD dataset contains 8 speakers (2 female and 6 male), aged 5-10 years.

The table below give further details for each speaker. Ages were taken in the first Assessment session and are indicated in years (AGE-Y) and months (AGE-M).

SPEAKER-ID	GENDER	AGE-Y	AGE-M	AGE
01M	M	6	0	6.0
02M	M	10	1	10.08
03F	F	8	7	8.58
04M	M	8	11	8.92
05M	M	6	5	6.42
06M	M	5	11	5.92
07F	F	7	6	7.5
08M	M	7	7	7.58

Sessions

Session	Description
BL	Baseline session before therapy (1-2 sessions)
Mid	Mid-point session, halfway through therapy
Post	Post-therapy session, immediately after therapy ended
Maint	Maintenance session, some time after therapy ended
Therapy	Therapy sessions

Data Types

Core data types

Data type	Description
wav	speech waveform
ult	raw ultrasound data
param	ultrasound parameters
txt	prompt text with date/time of utterance recording

Additional data

Data type	Description
slt_labels	manual annotation from SLT, when available. See [2] for details
speaker_labels	speaker diarization identifying therapist (SLT) and child (CHILD) speech
word_labels	automatic word-level alignment
phone_labels	automatic phone-level alignment
reference_labels	manually-revised labels (see below for details)

Labels are available in Praat's TextGrid format and HTK's lab format.

Speaker, word, and phone labels were generated according to the methods described in [3].

File IDs

Individual recordings are indexed for each session according to their recording times. See the prompt text file for recording date/time.

Each file ID also includes a prompt type identifier. See Data for details.

Reference Labels

Reference labels are given for a few utterances of the UXTD and UXSSD datasets. These have been manually revised at the speaker (60 utterances) and word level (199 utterances). The revision was done by a single annotator.

Note that phone labels are also provided, but these are not entirely manually-revised. This set of annotation is force-aligned at the phone level, but constrained by the manually-revised word boundaries.

Labels are available in Praat's TextGrid format (TG) and HTK's lab format (lab). The structure for the directory is as follows:

/uxssd
    /phone_labels
        /lab
        /TG
    /word_labels
        /lab
        /TG
    /speaker_labels
        /lab
        /TG

Additional Notes

Speaker 05M was subjected to two rounds of therapy, with corresponding Assessment sessions. These are identified as *_round2 in the speaker directory. Therapy sessions for this speaker are indexed chronologically.

References

[1] Eshky, A., Ribeiro, M. S., Cleland, J., Richmond, K., Roxburgh, Z., Scobbie, J., & Wrench, A. (2018) Ultrasuite: A repository of ultrasound and acoustic data from child speech therapy sessions. Proceedings of INTERSPEECH. Hyderabad, India.

[2] Cleland, J., Scobbie, J. M., & Wrench, A. A. (2015). Using ultrasound visual biofeedback to treat persistent primary speech sound disorders. Clinical linguistics & phonetics, 29(8-10), 575-597.

[3] Ribeiro, M. S., Eshky, A., Richmond, K., Renals, S., (2019). Ultrasound tongue imaging for diarization and alignment of child speech therapy sessions. Proceedings of INTERSPEECH. Graz, Austria.