With the recent developments of fake news playing a role in Trump’s elections, Cambridge Analytica using it to great manipulative extent and Mark Zuckerberg needing to testify on it against Congress, fake news is a hotter topic than ever.
Fake news is basically nothing more than gossip, spread online. The creators of fake news do their utmost to make it look as authentic as possible and as Cambridge Analytica revelations have shown us, also focus on specific audiences that would somehow be more susceptive to it. In fact, zooming into that whole ordeal a bit more without repeating all major news outlets, CA has been using psychmetrical profiling to proactively target groups of people that they would then initiate a specific, tailor-made and fake news item for.
Being the subject of conversation here, Facebook has made it one of their top priorities to combat such fake news from spreading on its platform. The question is however, if and how well one can detect such fake news without erroneously removing legitimate accounts.
How to detect fake news?
Within journalism and media, decent source verification and news authenticity verification has been a fundamental part of the business. With the rise of social media and those becoming more and more the go-to place for news at an increasingly faster pace, verification of authenticy is jeoaprdized by time constraints and by the anonymous nature of the internet, even if we are talking about social media.
Having this said, it of course makes sense to be able to pinpoint the original author or source of the news being spread, even if we are seeing that same news first arise from a different account or person. This structural way of looking at fake news – actually, looking at fake news spreaders – is an important way of detecting fake news and preventing it from happening in the future. This kind of news source detection (NSD in literature) is a rather new field in computational linguistics and quickly gaining attention. It often requires context that reaches beyond the media on which the news is being spread to understand. For example, news source detection of news spread on Facebook can be tackled by looking at and correlating timestamps, propagation of the news, network structure and external sources like other social media or news sites.
As you can imagine, while a decent way of doing fake news detection, this is a slow and difficult process. But can we do better?
Psychometrics + News Source Detection = Fake News Detection
We started this blogpost by introducing the recent developments on fake news spread and data breach initiated by Cambridge Analytica. As mentioned, a fundamental concept of what they were doing was to segment their audiences based on psychological aspects, something we call psychometrics. While Cambridge Analytica used this technique for dark practices, it turns out we can use the exact same techniques against them and fake news in general!
In fact, we have been working with psychometrics a lot and have written about it before. In the linked blogpost, we wrote about BIG5 personality traits and how they can help us better understand user behavior, but using it for fake news detection is not really straightforward.
It turns out however, that the same linguistic cues used in BIG5 profiling, can be used to validate the sincerity of someone’s writing. This means that we can use the same concepts, but model for a different outcome – BIG5 vs. fake news. There are some very interesting patterns that people tend to use if they are writing about something they do not fully support, intend or know about. The best thing is also that it is latent – meaning that we human beings cannot really prevent it from happening, it’s in our nature. This of course helps in fake news detection, but it also allows us to identify whether reviews about products or holidays are sincere or completely made up – an important area on its own that we do not focus on in this post.
Of course, revealing all those patterns will definitely help bad intented people to do a better job, which is precisely why we will not give any of them away in this blogpost. We did find however, that linguistic traits of sufficiently long news items can give away authenticity and sincerity with an amazing accuracy of over 80% – here, sufficiently long is about a dozen sentences or more. To put that into perspective, letting a randomly picked person do the same, will result in an accuracy no better than guessing (of course this can be greatly improved by training that person on what patterns to look for).
Putting both linguistic profiling in battle against fake news, especially combining it with News Source Detection, is a very effective and cost-efficient way of detecing fake news. We have worked to make these methods scalable to the extent that we can now offer it in our services.