Automatic Speech Recogition and Analysis of Children's Speech

TNT members involved in this project:
Christopher Gebauer, M.Sc.
Prof. Dr.-Ing. Jörn Ostermann
Lars Rumberg, M.Sc.

Analyzing natural speech and language samples of children is a well-known source of insights when conducting research in the field of speech and language acquisition. The process of collecting, manual transcription and analysis of these data however is extremely time-consuming and costly. Because of that, the data basis for speech and language development research is scarce.

Meanwhile speech recognition and processing technology has been developed to a point where use for research purposes in linguistics and speech-language-pathology seems possible. For the recognition of adult language, technology has evolved to mainstream applications. However processing child utterances is much more challenging due to their acoustic and language properties.

The project is an interdisciplinary collaboration with the Department for Speech and Language Therapy of the Institute of Special Education (IFS). By combining the domain knowledge of the IFS about children's speech and our expertise in machine learning and signal processing we aim to improve automatic speech recognition of children's speech to the point were it can be used for applications in speech language therapy.

The project is part of the interdisciplinary collaboration "Leibniz Lab for Relational Communication Research" (Project Website).

The kidsTALC corpus is a speech corpus of German children’s spontaneous speech. It is designed for training of ASR system, with the goal to be used to facilitate research of speech development and assist therapeutic applications. kidsTALC is the first German speech corpus that addresses the modern standards to meet the requirements for developing automatic tools to support language sample analysis in research and clinical applications.

The repository consists of multiple datasets (all containing connected speech), to represent different recording settings, language status, and ages. In the final version the repository will contain recordings from about 300 children, while their age range will span Kindergarten to elementary school. The elicitation contexts will cover various settings along the unstructured-structured continuum, such as free play, story tell, conversational discourse or read texts with a focus on spontaneous language. Also children with various oral and written language abilities will be included in the corpus, such as typically developing children and children with developmental language disorder or speech sound disorder. Participants for the entire repository are being recruited froma network of collaborating preschools, kindergartens and ele-mentary schools. Eligibility criteria for the current finished part of the dataset are: 3 ½–11 years, monolingual German speakers, typically developing.

For more information on the corpus please read our publication (pdf, BibTeX).

Access

To get access to the corpus, please contact us: kidstalc@tnt.uni-hannover.de

Recording Status

Date of Completion Target Number of Speakers Recorded Speakers Type Age
2022 90 49 Spontaneous: Typically Developed 3;6-10;11
2023 40 0 Spontaneous: Typically Developed 3;0-7;0
2024 60 0 Spontaneous: Developmental Language Disorders and Speech Sound Disorders 3;0-7;0
2024 100 0 Read: Typically Developed and Reading Difficulties 8;0-10;0

 

  • Conference Contributions
    • Lars Rumberg, Christopher Gebauer, Hanna Ehlert, Maren Wallbaum, Lena Bornholt, Jörn Ostermann, Ulrike Lüdtke
      kidsTALC: A Corpus of 3- to 11-year-old German Children’s Connected Natural Speech
      Proceedings INTERSPEECH 2022 – 23rd Annual Conference of the International Speech Communication Association, ISCA, September 2022
    • Lars Rumberg, Christopher Gebauer, Hanna Ehlert, Ulrike Lüdtke, Jörn Ostermann
      Improving Phonetic Transcriptions of Children’s Speech by Pronunciation Modelling with Constrained CTC-Decoding
      Proceedings INTERSPEECH 2022 – 23rd Annual Conference of the International Speech Communication Association, ISCA, September 2022
    • Lars Rumberg, Hanna Ehlert, Ulrike Lüdtke, Jörn Ostermann
      Age-Invariant Training for End-to-End Child Speech Recognition using Adversarial Multi-Task Learning
      Proceedings INTERSPEECH 2021 -- 22th Annual Conference of the International Speech Communication Association, August 2021