Improving on the Limitations of the ASR Model in Low-Resourced Environments Using Parameter-Efficient Fine-Tuning
December 2024
Abstract
Modern general-purpose speech recognition systems are more robust in languages with high resources. In contrast, achieving state-of-the-art accuracy for low-resource languages is still challenging. The fine-tuning of the pre-trained model is one of the highly popular practices which utilizes the existing information while efficiently learning from a small amount of data to enhance the precision and robustness of speech recognition tasks. This work attempts to diagnose the performance of a pre-trained model when transcribing the audio from the low-resource language. In this work, we apply an adapter-based iterative parameter-efficient fine-tuning strategy on a limited dataset aiming to improve the quality of the transcription of a previously fine-tuned model. For the experiment we used Whisper’s multilingual pre-trained speech model and Nepali as a test language. Using this approach we achieved Word Error Rate of 27.9%,which is more than 19% improvement over pre-trained Whisper Large − V2.