Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

Speech Recognition and Synthesis Models and Platforms for the Kazakh Language

by Abduali, Balzhan , Amirova, Dina , Karibayeva, Aidana , Karyukin, Vladislav

in Accuracy / Acknowledgment / Acoustics / Adaptation / Algorithms / Artificial intelligence / ASR / Automatic speech recognition / Datasets / Intelligibility / Kazakh language / Language / Language modeling / Language shift / Languages / Lexical semantics / Machine learning / Morphology / Phonemics / Phonetics / Preservation / Semantics / Speech / Speech perception / Speech recognition / Speech recognition software / Speech synthesis / STT / Text-to-speech / Transformation / TTS / Turkic languages / Voice recognition / Vowels

2025

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Are you sure you want to remove the book from the shelf?

Speech Recognition and Synthesis Models and Platforms for the Kazakh Language

by Abduali, Balzhan , Amirova, Dina , Karibayeva, Aidana , Karyukin, Vladislav

2025

Confirm

Do you wish to request the book?

Speech Recognition and Synthesis Models and Platforms for the Kazakh Language

by Abduali, Balzhan , Amirova, Dina , Karibayeva, Aidana , Karyukin, Vladislav

2025

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy

How would you like to get it?

Submit

We have requested the book for you!

Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.

Oops! Something went wrong.

Looks like we were not able to place your request. Kindly try again later.

Journal Article

Speech Recognition and Synthesis Models and Platforms for the Kazakh Language

Abduali, Balzhan,

Amirova, Dina,

Karibayeva, Aidana,

Karyukin, Vladislav

2025

Overview

With the rapid development of artificial intelligence and machine learning technologies, automatic speech recognition (ASR) and text-to-speech (TTS) have become key components of the digital transformation of society. The Kazakh language, as a representative of the Turkic language family, remains a low-resource language with limited audio corpora, language models, and high-quality speech synthesis systems. This study provides a comprehensive analysis of existing speech recognition and synthesis models, emphasizing their applicability and adaptation to the Kazakh language. Special attention is given to linguistic and technical barriers, including the agglutinative structure, rich vowel system, and phonemic variability. Both open-source and commercial solutions were evaluated, including Whisper, GPT-4 Transcribe, ElevenLabs, OpenAI TTS, Voiser, KazakhTTS2, and TurkicTTS. Speech recognition systems were assessed using BLEU, WER, TER, chrF, and COMET, while speech synthesis was evaluated with MCD, PESQ, STOI, and DNSMOS, thus covering both lexical–semantic and acoustic–perceptual characteristics. The results demonstrate that, for speech-to-text (STT), the strongest performance was achieved by Soyle on domain-specific data (BLEU 74.93, WER 18.61), while Voiser showed balanced accuracy (WER 40.65–37.11, chrF 80.88–84.51) and GPT-4 Transcribe achieved robust semantic preservation (COMET up to 1.02). In contrast, Whisper performed weakest (WER 77.10, BLEU 13.22), requiring further adaptation for Kazakh. For text-to-speech (TTS), KazakhTTS2 delivered the most natural perceptual quality (DNSMOS 8.79–8.96), while OpenAI TTS achieved the best spectral accuracy (MCD 123.44–117.11, PESQ 1.14). TurkicTTS offered reliable intelligibility (STOI 0.15, PESQ 1.16), and ElevenLabs produced natural but less spectrally accurate speech.

Share this book

Add to My Shelf

Publisher

MDPI AG

Subject

/ Artificial intelligence

/ ASR