Enhancing the Quality of Nepali Text-to-Speech Systems

August 2017

Domains

Text-to-Speech Systems Natural Language Processing

Authors

Abstract

Text-to-speech (TTS) systems are widely studied applications in Computer Science. It is more popular among the languages which has rich set of resources such as English and not as rigorously taken up in under resourced languages such as Nepali. Nevertheless, it has wider scope of application in different areas including telephony, e-learning and telecommunication.

The underresourced languages have trouble in developing the natural sounding TTS system. This is primarily because of the linguistic resources involved in the system. The preparation of such linguistic resources is costly, time consuming and requires the involvement of linguists/experts. The general trend in this research domain is to develop natural sounding TTS out of limited resources available. Nepali, being an underresourced language has very few linguistic resources available for developing TTS system.

In this work, we modified the existing TTS system [9] by adding computational units to process the input and output, we call them post and pre processing modules. We also made the system available to the public through the desktop application and plugin for the Firefox by pruning and adding phonetic rules and normalization rules.

We evaluated the existing and modified TTS systems via the qualitative evaluation techniques where 30 users were asked to provide their evaluation of the systems being based on the parameters- intelligibility and naturalness. Our results have shown that there has been an overall improvement of 6% in terms of naturalness and intelligibility, whereas the result of comprehension and diagnostic rhyme test is increased by 12% and 10% respectively.

Keywords

NLP Deep Learning