Automatic Speech Recogition and Analysis of Children's Speech

TNT members involved in this project:
Christopher Gebauer, M.Sc.
Prof. Dr.-Ing. Jörn Ostermann
Lars Rumberg, M.Sc.

Analyzing natural speech and language samples of children is a well-known source of insights when conducting research in the field of speech and language acquisition. The process of collecting, manual transcription and analysis of these data however is extremely time-consuming and costly. Because of that, the data basis for speech and language development research is scarce.

Meanwhile speech recognition and processing technology has been developed to a point where use for research purposes in linguistics and speech-language-pathology seems possible. For the recognition of adult language, technology has evolved to mainstream applications. However processing child utterances is much more challenging due to their acoustic and language properties.

The project is an interdisciplinary collaboration with the Department for Speech and Language Therapy of the Institute of Special Education (IFS). By combining the domain knowledge of the IFS about children's speech and our expertise in machine learning and signal processing we aim to improve automatic speech recognition of children's speech to the point were it can be used for applications in speech language therapy.

The project is part of the interdisciplinary collaboration "Leibniz Lab for Relational Communication Research" (Project Website).

The kidsTALC corpus is a speech corpus of German children’s spontaneous speech. It is designed for training of ASR system, with the goal to be used to facilitate research of speech development and assist therapeutic applications. kidsTALC is the first German speech corpus that addresses the modern standards to meet the requirements for developing automatic tools to support language sample analysis in research and clinical applications.

The repository consists of multiple datasets (all containing connected speech), to represent different recording settings, language status, and ages. In the final version the repository will contain recordings from about 300 children, while their age range will span Kindergarten to elementary school. The elicitation contexts will cover various settings along the unstructured-structured continuum, such as free play, story tell, conversational discourse or read texts with a focus on spontaneous language. Also children with various oral and written language abilities will be included in the corpus, such as typically developing children and children with developmental language disorder or speech sound disorder. Participants for the entire repository are being recruited froma network of collaborating preschools, kindergartens and ele-mentary schools. Eligibility criteria for the current finished part of the dataset are: 3 ½–11 years, monolingual German speakers, typically developing.

For more information on the corpus please read our publication (pdf, BibTeX).

Access

To get access to the corpus, please send the signed end-user agreement to kidstalc@tnt.uni-hannover.de.

You will be provided a username and password to download the corpus here:

Version 1, October 2022: kidsTALC-v1

Recording Status

Date of Completion Target Number of Speakers Recorded Speakers Type Age
2022 90 49 Spontaneous: Typically Developed 3;6-10;11
2023 40 0 Spontaneous: Typically Developed 3;0-7;0
2024 60 0 Spontaneous: Developmental Language Disorders and Speech Sound Disorders 3;0-7;0
2024 100 0 Read: Typically Developed and Reading Difficulties 8;0-10;0

 

Show all publications
  • Lars Rumberg, Christopher Gebauer, Hanna Ehlert, Maren Wallbaum, Ulrike Lüdtke, Jörn Ostermann
    Uncertainty Estimation for Connectionist Temporal Classification Based Automatic Speech Recognition
    Proc. INTERSPEECH 2023, pp. 4583--4587, August 2023
  • Christopher Gebauer, Lars Rumberg, Jörn Ostermann
    Pronunciation Modeling for Children’s Speech
    Elektronische Sprachsignalverarbeitung (ESSV), TUDpress, Dresden, pp. 79--86, March 2023
  • Hanna Ehlert, Edith Beaulac, Maren Wallbaum, Christopher Gebauer, Lars Rumberg, Jörn Ostermann, Ulrike Lüdtke
    Collecting and Annotating Natural Child Speech Data – Challenges and Interdisciplinary Perspectives
    Elektronische Sprachsignalverarbeitung (ESSV), TUDpress, Dresden, pp. 72--78, March 2023