A Malay-English Code-Switching Bilingual ASR System

Main Article Content

Mumtaz Begum Mustafa
Ireen Yek Chung Mei
Miss Laiha Mat Kiah
Saravanan Muthaiyah
Farzana Parveen Tajudeen

Abstract

This paper examines the Code-Switching (CS) phenomenon among Malaysians, where Malay and English are spoken interchangeably at inter and intra-sentence. This issue poses challenges for automatic speech recognition (ASR) since the systems need to handle the input in a multilingual setting. While Monolingual ASR systems can recognize a few words from a foreign language, they are usually not robust enough to handle varied code-switching styles. Besides, the lack of large, code-switched speech corpora to capture all these styles makes developing CS speech recognition systems challenging. In this research, a bilingual CS ASR system for Malay and English CS speech was developed using several approaches, namely combining and merging. The effectiveness of the developed bilingual models using these approaches is evaluated using the word error rate (WER) and compared against the performance of the monolingual models. The result shows that merging the English and Malay speech acoustic is an effective technique for recognizing the CS in Malay and English speech.

Article Details

Section
Articles