Sometimes we fail to understand other’s emotion. So how it will be when machines try to understand ours? When writing programs we care about the syntax and structures but these concerns are not there in communication between people. To process our language machines have to understand not only what we say, but what we mean. Natural language processing is a fascinating subject to explore. But what makes it complicated?

Human communication isn’t just a group of words. It’s a mix of sentiments which needed to be analyzed to understand what we really mean.

Why should I care

The contemporary business world is a place where huge success and failure sit side by side. In traditional market research, business spends a huge amount to analyze customer’s opinions through continuous surveys and consultants. But nowadays social media empowers business a lot. Most the existing and potential customers are generating a treasure trove of data through Twitter, Facebook, LinkedIn and so on. Sentiment analysis is a powerful tool for mining the gold beneath the social media landslide.

The goal of sentiment analysis is to identify the opinions expressed in a text.

It seems easy, right?

  1. I’m happy to watch a new movie
  2. I hate war

Since happy is a positive word and hate is negative, we know the sentiments in each sentence. This is the most simple situation.

But what if a text contains two opinion words and one sentiment?

If we assign +1 to positive word and -1 to negative word,

  • I’m happy and excited to be going to watch a new movie

The above sentence is positive +2

  • I hate war and the violence it makes

The text contains two negative words so it is negative -2

Now, let’s consider a text with mixed polarity.

That’s, two opinion words and two sentiments

What we learned is adding +1 and -1 equals 0 but here it doesn’t mean that the text is necessarily neutral.

  • Ice cream shops are doing great even when the weather is bad

The statement is positive about ice cream shops but negative about weather condition. In general, we will say that this is a positive statement about ice cream shops. We wouldn’t say this is neutral.

When SA turns hard

Sometimes it’s not possible to identify the opinion by just analyzing the polarity of words. Language usage and sarcasm are some of the reasons why sentiment analysis turns hard. It’s tough to analyze mixed sentiments in a text.

Sometimes it’s difficult to classify sarcastic statements as positive or negative. As a human being, we can understand what sarcasm is and how it actually makes sense. If you put this into a neural network or any machine learning framework that come up with a simple classifier to just understand the sentiment, this would fail miserably.

Another point to be considered is local dialects. If you train any neural network on data about local dialects, it would invariably not understand what is trying to say. Because some of the words in local dialects may not have any sense and its tough to train anything and everything. So if you have a pre-trained model doing some sort of test on these data and then, it would completely fail. This is one of the reasons why it’s important to understand the local culture and some companies are setting up local data centers, where local sentiment is captured.

Natural language has a lot of ambiguity. From the above examples, it’s clear that words make sense contextually in natural language which humans can comprehend and distinguish easily, but machines can’t. This makes Natural Language Processing one of the most difficult and interesting tasks in AI.

Using Natural Language Processing

  • Spell check and grammar check
  • Predictive text
  • Auto summarization
  • Machine translation
  • Sentiment analysis

Some common approaches to sentiment analysis

Various methods in Machine learning and Natural Language Processing for sentiment analysis. Some of the most effective approaches we have today rely on the human-in-the-loop-approach: learning from the user feedback. The combination of machine-driven classification enhanced by human-in-the-loop approach increases acceptable accuracy than pure Machine Learning based systems.

Tools & Libraries

Python’s scientific calculation libraries such as SciPy, NumPy have strong support from the academic world. It’s a very well established library that was chosen for its expressiveness, ease-of-use.

We too have tools……

Much of the data that machine learning algorithms need for NLP tasks such as sentiment analysis, spam filtering all come from the web. Ruby has a web framework that is quite popular and generates massive amounts of data. While it doesn’t have the same vast academic network that Python or R has, it does have tools and has the added benefit of being easy to learn and comprehend.

Sentimental gem

https://github.com/7compass/sentimental

Sentimental gem was introduced for simple sentiment analysis with Ruby. It implements a lexicon-based approach to extract sentiments, where the overall contextual sentiment orientation is the sum of sentiment orientation of each word(tokens). The overall sentiment of a sentence is output as positive, negative or neutral. It uses a dictionary consisting of pre-tagged lexicons. The input text is converted to tokens by the Tokenizer and is then matched for the pre-tagged lexicons in the dictionary.

To classify sentiments we can set a threshold value. Values greater than the threshold is considered as positive and less than that is considered negative. The default threshold is 0.0. If a sentence has a score of 0, it is deemed “neutral”.

Consider the following example

It outputs

It works well for a simple sentence. Consider another example with mixed polarity.

But, consider another example with mixed polarity.

We get the output as

We expect a positive result here, but it failed.

The overall score is determined by the sum of the scores of each opinion words. In its lexical dictionary, good is assigned a score 0.6394, bad is assigned -0.5588, and the token weather is assigned a score of -0.5. Hence the overall sentiment scores -0.4194.

The gem was found to work well for simple sentences, but failed to give accurate results for sentences with mixed polarity.

Sentimentalizer gem

https://github.com/malavbhavsar/sentimentalizer

Implements sentiment analysis in Ruby with machine learning. It’s basically training a model to use in the application. Machine learning based analysis gains more interest of researchers due to its adaptability and accuracy. It overcomes the limitation of the lexical approach of performance degradation and works well even when the dictionary size grows rapidly.

We need to train the engine in order to use it.

This outputs as

Overall sentiment is positive which is indicated as 🙂

But this method faces challenges in designing classifier, availability of training data, the correct interpretation of a new phrase which is not in the training dataset.

Classifying with Bayesian and SVM classifiers

Basically, sentiment analysis is the classification of text. Ankusa, Eluka, Classifier, and Hoatzin are some Bayesian and SVM classifiers that can be used for sentiment analysis. Among them, Hoatzin, Classifier, and Eluka use LibSVM, a library for Support Vector Machine. Simple models work best for all. The gem Ankusa provides Naive Bayes classifier which provides more accuracy than Baseline but less than gem Eluka which implements SVM classifier.

When we need more…

JRuby

Ruby is a very expressive language with excellent string processing capabilities. Also, there are excellent Java libraries for NLP and JVM is a high-performance platform with true multi-threading capabilities. JRuby allows us to leverage well-established, mature Java libraries from within your Ruby code.

Sentiment Analysis using Tensorflow Ruby API

https://github.com/somaticio/tensorflow.rb

TensorFlow is an extraordinary open source software library for numerical computation using data flow graphs developed by researchers working on the Google Brain Team within Google’s Machine Intelligence research organization for conducting machine learning and deep neural networks research. Even though Tensorflow seems to be an overkill for simpler tasks, certainly it would be an alternate and more efficient way to analyze tweets if you have rich and high volume data. It helps to create your own sentiment classifiers to understand the large amounts of natural language in the world.

In this article, we discussed the various approaches towards sentiment analysis which is a part of Natural Language Processing. We have seen that sentiment extraction and analysis can be done using supervised or unsupervised learning, sentiment lexicon-based approach or a mix of these and any of these methods can be implemented in Ruby.

 

Reference