Classifying sentiments in Nepali subjective texts

July 2016

Domains

Natural Language Processing Sentiment Analysis

Authors

Abstract

With the advent of the online social media such as Facebook, Twitter and blogs, the way people perceive things around them has dramatically changed. One simple example could be how people today buy a mobile phone. If in the past, shopping involved moving from one store to the other, these days one cares more about the opinions expressed by people in product reviews rather. There is an increasing tendency to leave one's opinion(s) on a product, service or any entity on the web thus opening doors for an interesting yet challenging field of Sentiment Analysis. Much work has been done in the last decade on Sentiment Analysis not surprisingly for English because of the availability of a large number of resources but other languages are also gradually taking pace, some of them already in advanced stages and already developing competing resources and applications compared to English. Nepali opinionated content is also increasing rapidly in the last few years. Nepal fares as one of the countries with the highest number of Facebook subscribers. Online presence and voicing one's opinion in the Internet through different social media platforms has become a norm both for individuals or businesses. Nepali is a morphologically rich, under-resourced and a free word order language. Although some works related to the language and technology have been done in the past by institutions like Madan Puraskar Pustakalaya1 and Language Technology Kendra2, Nepali still remains an under-resourced language as it has very few tools and annotated corpus available for the language. This makes it challenging and difficult to do any linguistic computational works in the language including Sentiment Analysis. In this work, we look into applying three Machine Learning classifiers, namely Support Vector Machine, Multinomial Naive Bayes and Logistic Regression for developing a model to classify book and movie reviews written in Nepali into “Positive” and “Negative”. We evaluate and validate our model using 5-fold cross-validation techniques. Experimental results show that the Multinomial Naive Bayes classifier performs with a higher accuracy than the other two classifiers.

Keywords

NLP Machine Learning