Automatic Speech Recogition and Analysis of Children's Speech

TNT members involved in this project:

Successful language acquisition is essential for social inclusion, academic success, and career advancement. Facing workforce shortages in education and therapy, plus a lack of standardized diagnostics, AI-driven automation holds great promise for assessing language development needs. Traditionally, analyzing children’s natural speech has offered valuable insights into language acquisition, yet manual collection, transcription, and analysis are prohibitively time-consuming and costly. While speech recognition technology is now robust enough for adult language research, child speech—with its unique acoustic and linguistic traits—remains a significant challenge for existing systems.

The project is an interdisciplinary collaboration with the Department for Speech and Language Therapy of the Institute of Special Education (IFS). By combining the domain knowledge of the IFS about children's speech and our expertise in machine learning and signal processing we aim to improve automatic speech recognition of children's speech to the point were it can be used for applications in speech language therapy. The project is part of the interdisciplinary collaboration "Leibniz Lab for Relational Communication Research" (Project Website).

To make such AI-based tools practically usable—for instance, our solutions for the automated assessment of child language—sustainable technical support and infrastructure are essential. These requirements exceed the scope of traditional research projects. Therefore, we successfully secured funding last year to support a spin-off of our research. Over the next two years, we will translate our findings into real-world software products in close collaboration with practitioners in education and healthcare, while simultaneously building the necessary organizational and business structures. Our website, Phonomatics, will serve as the central hub for updates on product development, field collaboration, and progress on the spin-off initiative.

Approach

A key component of our system is the automatic transcription of children's speech, which forms the foundation for in-depth analysis of both oral language skills (e.g., vocabulary, morphosyntactic structures, articulatory features) and literacy-related abilities (e.g., reading speed, accuracy, and comprehension). Our system architecture integrates state-of-the-art models in speaker diarization, automatic speech recognition (ASR), and linguistic analysis. These components are domain-adapted and continuously optimized to ensure robustness, accuracy, and interpretability in diagnostic contexts. In addition to advancing technical infrastructure, we are also developing novel diagnostic metrics that are only feasible through AI-based processing—for example, prosodic patterns in both spoken and read language. These features are inaccessible through manual methods and illustrate the unique potential of AI in language assessment.

kidsTALC Corpus

The kidsTALC corpus is a speech corpus of German children’s spontaneous speech. It is designed for training of ASR system, with the goal to be used to facilitate research of speech development and assist therapeutic applications. kidsTALC is the first German speech corpus that addresses the modern standards to meet the requirements for developing automatic tools to support language sample analysis in research and clinical applications.

The repository consists of multiple datasets (all containing connected speech), to represent different recording settings, language status, and ages. In the final version the repository will contain recordings from about 300 children, while their age range will span Kindergarten to elementary school. The elicitation contexts will cover various settings along the unstructured-structured continuum, such as free play, story tell, conversational discourse or read texts with a focus on spontaneous language. Also children with various oral and written language abilities will be included in the corpus, such as typically developing children and children with developmental language disorder or speech sound disorder. Participants for the entire repository are being recruited froma network of collaborating preschools, kindergartens and ele-mentary schools. Eligibility criteria for the current finished part of the dataset are: 3 ½–11 years, monolingual German speakers, typically developing.

For more information on the corpus please read our publication (pdf, BibTeX).

Access to kidsTALC Corpus

To get access to the corpus, please send the signed end-user agreement to kidstalc@tnt.uni-hannover.de.

You will be provided a username and password to download the corpus here:

Version 1, October 2022: kidsTALC-v1

Recent Publications

Show all publications

Lars Rumberg, Christopher Gebauer, Jörn Ostermann
Aggregation-Free Uncertainty Estimation for CTC-Based Automatic Speech Recognition
IEEE Transactions on Audio, Speech and Language Processing, IEEE, June 2025
(IEEEexplore) BibTeX
Christopher Gebauer, Lars Rumberg, Fabian Witt, Edith Beaulac, Hanna Ehlert, Jörn Ostermann
Rule-Based Grammatical Error Detection on Spontaneous Children’s Speech
Elektronische Sprachsignalverarbeitung (ESSV), pp. 117-124, 2025
(pdf) BibTeX
Christopher Gebauer, Lars Rumberg, Lars Köhn, Hanna Ehlert, Edith Beaulac, Jörn Ostermann
Grammatical Error Detection on Spontaneous Children’s Speech Using Iterative Pseudo Labeling
to appear in Proceedings INTERSPEECH 2025 – 26th Annual Conference of the International Speech Communication Association, ISCA, 2025
(pdf) BibTeX