This is the second part in a two-series blog. Read the first part here.
In the previous part, we discussed the differences between syntactical and semantical applications of natural language processing. In both scenarios, word vectors seem like a promising way forward given their recent popularity and success. As we outlined, word vectors work quite well on syntactical tasks given that the nature of word vectors comes from looking at words’ surroundings. For semantical applications however, we have argued that they are less suitable.
We were mainly looking to apply word vectors on sentiment analysis as a first semantical application. Our award-winning and patented RBEM algorithm is great on paper but has its flaws as it requires a vast amount of human expert input to work correctly, something we were hoping to overcome using word vectors.
Combining the Best of Both Worlds
Our idea was to combine the powerful RBEM algorithm with the strength of word vectors in a way that makes less human supervision required for the RBEM algorithm. We figured we could do that by using word vectors and similarities and combining that into the RBEM algorithm’s linguistic rules. The result of this was an unsupervised version of the RBEM algorithm – that we dubbed URBEM (find the source code here).
The success of the URBEM algorithm largely depends on the ability of word vectors to capture semantical similarities, or more particularly, sentimental similarities in our case. As discussed in the first episode of this blog, this is not something that we found word vectors to be pretty good at on their own, regardless of the many success stories out there suggesting otherwise.
We set out using FastText’s 300-dimensional pre-trained word vectors, which, on their own made little headway. FastText has the ability however, to bias the word vectors towards specific labels, such as sentiment. Since sentiment corpora in Dutch are pretty rare compared to English ones, we had to come up with an idea on how to get sentiment labeled data, again, without human expert input. We used a well-known method where we collected Twitter data for a couple of months on emojis. Emojis are not strong identifiers of sentiment and sometimes even express the opposite, but they are weak indicators and other work has shown this to be reasonably effective, especially for languages lacking labeled data.
After finetuning for 2 weeks, we ended up with sentiment(or well; emoji)-biased word vectors and were eager to test our the URBEM algorithm. Much to our liking, words like good and well (their Dutch equivalents) were now closer to each other than either was to bad. This is a good thing! We didn’t have this before.
Back to Earth Once More
Given that our initial tests seemed to be far more positive than before, we started experimenting largely with the URBEM method and all its parameters. We ran an extensive grid search on finding the optimal parameters but ended up with accuracies that lacked far behind many generic baseline approaches like just using Naive Bayes, SVM or Decision Trees., definitely lacking behind state of the art methods like RBEM and Stanford’s Recursive Nets.
We investigated a bit further and found that while common sentimental words like good and bad are properly tackled now, just barely less frequent words like awesome and terrible were again much more asimilar to good and bad respectively as good and bad were to each other, even though we spend quite a while finetuning the word vectors.
This concluded our rant on word vectors and how to apply for semantical tasks like sentiment analysis, in an unsupervised manner. Of course, methods like Stanford’s Recursive Neural Nets do work well using word vectors, but the input data required for that algorithm is a sentiment-annotated parse tree – this is pretty much the exact same input as is required for the original RBEM algorithm, except that for RBEM the order of the tree does not matter whereas for the RNN it does. This kind of annotation only exists for English and is difficult, even for experts, to construct from scratch.
We ended up using a mixture of word vectors, deep learning, RBEM and general ML approaches to do perform our semantical tasks, while still successfully utilizing word vector based deep learning methods for syntactical tasks like parsing.