Profanity and Offensiveness Detection in Nepali Language Using Bi-directional LSTM Models

December 2024



Abstract

Offensive and profane content has been on the rise in Nepali Social Media, which, is very disturbing to users. This is partly due to the absence of proper tools and mechanisms for the Nepali language to deal with profanity and offensive texts. In this work, we attempt to develop a deep learning-based profanity and offensive comments detection tool. We develop a Bi-LSTM (Bidirectional Long Short Term Memory) based model for the classification of Profane and Offensive comments and study different variations of the task. Furthermore, Multilingual BERT embedding and vocab embedding were used among others for an accurate understanding of the intent and decency of the posts. While previous related studies in the Nepali language are more focused on sentiment and offensiveness detection only, our study explores profanity and offensiveness detection as two distinct tasks. Our Bi-LSTM model outputs 87.8% accuracy


Keywords

NLP AI Deep Learning