Efforts in the Development of an Augmented English–Nepali Parallel Corpus

December 2019

Domains

Low-Resource Languages Natural Language Processing

Authors

Abstract

A crucial resource for Machine Translation between any two languages is the amount of quality parallel data. High-resource language pairs have been abundantly studied, but this is not true in the case of under-resourced languages. However, the attention is now gradually shifting towards under-resourced languages as well. Efforts are underway for creating more parallel data. In this paper, we explain the procedures we followed to develop an augmented English–Nepali parallel corpus. We also report new baseline scores for the pair.

Keywords

NLP AI Deep Learning