Finding the Haters – Hate Speech Detection

Share this blog post

If you haven’t been living under a rock, you know that recently there is a spur of interest and buzz around topics like fake news and hate speech. While we’ll soon blog about fake news, we focus on hate speech in this post.

Detecting hate speech is important for a multitude of applications. Generally said, combatting hate speech creates a safer environment for people to discuss and share opinions. Projecting this onto troubled areas or topics is a straightforward application of using hate speech, especially since calming discussions in areas that are stressed can result in less rebellion, immigrants and even help prevent war – a powerful tool!

What is hate speech?

Of course when we talk about hate speech, we need to have a working definition. We found this straightforward question difficult to answer as it is very subjective and not standardized. There are many different definitions around and the official terminology has not yet been fixed. What all the different definitions seem to have in common, is that hate speech is hateful, threatening and/or abusive speech or writing towards a group of people on basis of certain attributes. These attributes are properties like race, ethnicity, nationality, religion, sexual orientation, gender, gender identity or disabilities. Some of the definitions emphasize the incitement of violence against these groups. Our aim was to be able to identify when a message contained explicit or implicit forms of hate speech such as these.

Before we started on our methodology, we needed a controlled set of messages to validate it on. We hence set out to gather clear messages containing hate speech, some random messages about the birds and the bees and, to make sure we challenged ourselves to the fullest, a set of very challenging messages that were hateful, but not containing hate speech. The latter would negative sentiment or curse words but were not actual hate speech. We tested our algorithm on all these forms of data to ensure there is no unfair hate speech detection.

The shortcoming of lexicons

While detecting hate speech is a relatively new topic that has not yet been studied extensively, there are some works in this area. The approaches followed by these works can be divided into two categories:

  1. Using lexicons (a list of words) that correlate with hate speech – texts containing these words are classified as containing hate speech
  2. Supervised classification methods where a large corpus of labelled data is used to train a classifier that can predict the classes

For the first approach, it is easy to see that with our definition of hate speech we run into an issue. We need to identify whether a message is hate-bearing or not, which, indeed, we can very rudimentary do with a lexicon. This however, would also include all of our challenging data as being hate speech bearing – an undesired effect. This is due to the fact that we also need to be able to determine the subject and orientation of the hate in the message and if and only if it is aimed towards one of the attributes like gender, race or religion, it can be rightfully dubbed hate speech. Lexicons hence fall short.

For the second approach, just as with our topic detection algorithm, we do not like having to gather labelled data every time we want to tackle a new domain, context or language. Of course we did collect data to validate our algorithm, but our aim is to allow our algorithm to work without this data. Why? Well, in supervised machine learning, any new form, language or orientation of hate speech needs to be present in our labelled training data – a costly and not so scalable trait.

The Eureka moment

We already hinted on it, but the algorithm we used is a lot like our topic detection algorithm, which is nicely unsupervised! First our topic detection is used to detect if a text is hate-bearing, regardless of its orientation. If the text is found to be hate-beating, we essentially run another instance of our topic detection algorithm to determine the orientation of the hate. These topics are roughly defined as the attributes of groups subject to hate speech – this way we can determine if we are dealing with racism, religion hate or other forms of hatred. This two-tiered approach combines hate-identification and orientation-identification, without the need to manually craft full lexicons or curate datasets with labels.

We have found our algorithm to perform better than the current state of the art where lexicons are used. While it is unfair to compare our algorithm to supervised approaches – supervised learning always performs better than unsupervised learning jut because they have more information; the label of a text – we did find our algorithm to be competitive with supervised results, though not significantly outperforming those methods. The true gain of our method though is that we do not need curated data the supervised methods do need.

One critical note to make is that hate speech is highly culturally and context dependent. This makes it difficult to identify hate speech in a mixture of contexts and texts. For example, collecting generic English messages results in data containing blends of different countries, cultures and opinions. We found this to be a challenging way of doing hate speech detection – one that supervised methods could never even tackle due to different interpretations of labels – yet, we also realized that when we would be using hate speech detection in real scenarios, we would have a strong context and hence not run into this issue.

Share this blog post