Efforts Towards Developing a Tamang Nepali Machine Translation System
December 2019 - October 2020
Researchers
Description
<p>The Tamang language is spoken mainly in Nepal, Sikkim, West Bengal, some parts of Assam, and the North East region of India. As per the 2011 census conducted by the Nepal Government, there are about 1.35 million Tamang speakers in Nepal itself. In this regard, a Machine Translation System for Tamang-Nepali language pair is significant both from research and practical outcomes in terms of enabling communication between the Tamang and the<br /> Nepali communities. In this work, we train the Transformer Neural Machine Translation (NMT) architecture with attention using a small hand labeled or aligned Tamang:Nepali corpus (15K sentence pairs). Our preliminary results show BLEU scores of 27.74 for the Nepali to Tamang direction and 23.74 in the Tamang to Nepali direction. We are currently<br /> working on increasing the datasets as well as improving the model to obtain better BLEU scores.</p>