Catalogue Search | MBRL

Speech Recognition and Synthesis Models and Platforms for the Kazakh Language

by Abduali, Balzhan , Amirova, Dina , Karibayeva, Aidana in Accuracy , Acknowledgment , Acoustics

2025

With the rapid development of artificial intelligence and machine learning technologies, automatic speech recognition (ASR) and text-to-speech (TTS) have become key components of the digital transformation of society. The Kazakh language, as a representative of the Turkic language family, remains a low-resource language with limited audio corpora, language models, and high-quality speech synthesis systems. This study provides a comprehensive analysis of existing speech recognition and synthesis models, emphasizing their applicability and adaptation to the Kazakh language. Special attention is given to linguistic and technical barriers, including the agglutinative structure, rich vowel system, and phonemic variability. Both open-source and commercial solutions were evaluated, including Whisper, GPT-4 Transcribe, ElevenLabs, OpenAI TTS, Voiser, KazakhTTS2, and TurkicTTS. Speech recognition systems were assessed using BLEU, WER, TER, chrF, and COMET, while speech synthesis was evaluated with MCD, PESQ, STOI, and DNSMOS, thus covering both lexical–semantic and acoustic–perceptual characteristics. The results demonstrate that, for speech-to-text (STT), the strongest performance was achieved by Soyle on domain-specific data (BLEU 74.93, WER 18.61), while Voiser showed balanced accuracy (WER 40.65–37.11, chrF 80.88–84.51) and GPT-4 Transcribe achieved robust semantic preservation (COMET up to 1.02). In contrast, Whisper performed weakest (WER 77.10, BLEU 13.22), requiring further adaptation for Kazakh. For text-to-speech (TTS), KazakhTTS2 delivered the most natural perceptual quality (DNSMOS 8.79–8.96), while OpenAI TTS achieved the best spectral accuracy (MCD 123.44–117.11, PESQ 1.14). TurkicTTS offered reliable intelligibility (STOI 0.15, PESQ 1.16), and ElevenLabs produced natural but less spectrally accurate speech.

Journal Article

Share this book

Add to My Shelf

The Spring Holiday Nauryz-Meiramy in the Kazakh Tradition

by AMIROVA, Dina in Cultural identity , Ethnomusicology , Islamization

2018

The present article is the first attempt by the author to give a scholarly interpretation of the interesting yet under investigated phenomenon of Kazakh ‘Nauryz’. The ethno-cultural traditions of ‘Nauryz’, lost as a result of Islamization and Sovietization of Kazakhstan, have been revived over the last 30 years. The article reviews the basic socio-historical prerequisites of ‘Nauryz’ in the system of Kazakh traditional culture, taking into account the typological features of spring holiday (the idea of cyclic revival of life), as well as specific aspects of Central Asian nomadism. The author regards ‘Nauryz-meiramy’ as a complete calendar-ceremonial complex of high social and spiritual significance and conditionally differentiates three interconnected components of the traditional ‘Nauryz-meiramy’ (ritual, competitive, entertaining) and emphasizes their duplicative character. The article devotes special attention to an examination of the musical context of ‘Nauryz-mejramy’. The author argues for the existence of the currently lost song genre specific to Nauryz-meiramy (akin to carols) and based on the available data attempts to reconstruct its original model.

Journal Article

Share this book

Add to My Shelf

The Development and Experimental Evaluation of a Multilingual Speech Corpus for Low-Resource Turkic Languages

by Tukeyev, Ualsher , Shormakova, Assem , Abduali, Balzhan in Accuracy , Artificial intelligence , Automatic text generation

2025

The development of parallel audio corpora for Turkic languages, such as Kazakh, Uzbek, and Tatar, remains a significant challenge in the development of multilingual speech synthesis, recognition systems, and machine translation. These languages are low-resource in speech technologies, lacking sufficiently large audio datasets with aligned transcriptions that are crucial for modern recognition, synthesis, and understanding systems. This article presents the development and experimental evaluation of a speech corpus focused on Turkic languages, intended for use in speech synthesis and automatic translation tasks. The primary objective is to create parallel audio corpora using a cascade generation method, which combines artificial intelligence and text-to-speech (TTS) technologies to generate both audio and text, and to evaluate the quality and suitability of the generated data. To evaluate the quality of synthesized speech, metrics measuring naturalness, intonation, expressiveness, and linguistic adequacy were applied. As a result, a multimodal (Kazakh–Turkish, Kazakh–Tatar, Kazakh–Uzbek) corpus was created, combining high-quality natural Kazakh audio with transcription and translation, along with synthetic audio in Turkish, Tatar, and Uzbek. These corpora offer a unique resource for speech and text processing research, enabling the integration of ASR, MT, TTS, and speech-to-speech translation (STS).

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter