In this series of two blogs, we will investigate the usage of word vectors for a variety of applications.
It is no secret that we at UnderstandLing love Deep Learning and Word Vectors in particular. The implementation of word vectors that is FastText, even more so. We even have our own fork of a port of FastText to Java, containing some fixes and more up-to-date functionality.
As we are foremostly dealing with multilingual data, we mainly try to use word vectors for the unsupervised power and abstract way of learning some latent features about words and, in case of FastText, phrases. There are some interesting mathematical properties that hold for word vectors and how they relate to our semantical understanding of language. For example, subtracting the vector for King from Man, then adding the vector for Queen will result in a vector close to Woman, which is what we as humans would expect as well.
Syntax or Semantics
That analogies of words can be reflected in the mathematics of word vectors is interesting to us since we mostly deal with semantical language challenges at UnderstandLing. Don’t get us wrong, we do have methods for syntactical parsing, for example, we are the only company in the world that can do unsupervised constituency and dependency parsing in multiple languages, but our main interest is in doing semantical tasks.
Due to the promising nature of- and research on word vectors, we have been putting a lot of effort into getting them to work for us for our semantical tasks lately. For example, we use it with great success for topic classification. Other popular tasks we perform are sentiment analysis (in fact; polarity detection – determining if a text is positive, negative or neutral), emotion detection (for example, using Plutchik’s wheel of emotions), persuasion analysis and authorship profiling. Word vectors have been used plentifully on such tasks as well, so we were confident we would be able to get something decent from the word vectors.
Snap Back to Reality
We started off experimenting with all sorts of ways to classify sentiment using FastText’s pre-trained word vector models, of course in multiple languages to see whether it would scale or not. Since a vector for a word is determine mostly by the context it occurs in, we knew that semantically contrasting words like good and bad would still have rather similar vectors. Some quick experiments showed this indeed to be the case. Even more so, the Dutch words goed (good) and goede (well), which should be very similar for sentiment purposes, were found to be far less similar than the Dutch words for good and bad, or even good and the for that matter. This completely and fundamentally took away all ideas we had for applying word vectors to semantical tasks.
Stay tuned for the follow-up blog to see what else we tried and how we overcame the issues intrinsic to word vectors.