Big data science (yes I am merging two fields into one here) is booming business and an exciting field for many companies to step into. Tech startups are popping up like mushrooms and the big tech vendors as well as consultancies are nowadays rapidly integrating with all the popular big data and data science tools too. Most of these big data science tools and consultancies out there are focusing on the wrong thing though – let me explain why.
Big Data Science Status Quo
It is safe to say that big data lends most of success due to the introduction of distributed compute platforms such as Hadoop and Spark as well as due to distributed data stores such as HDFS, Cassandra and MongoDB – to name some of the most popular ones. Getting your data in place righteously is usually just the beginning however, and analysis platforms and languages such as R, Spark MLLib and Python’s SciKit – again, just to name a few – made (and are still making) the data science field increasingly popular.
The aforementioned technologies all have one thing in common; they require (technical) experts to use and operate them. While one might argue that R can be operated by less technically educated people than those writing Spark jobs in Scala, R is still just a programming language and not usable to the majority of business users. And this is wrong!
What About Other Fields?
Big data science is often regarded as a natural sequel to BI (business intelligence). With the volumes of data and the speed at which we want to query it, BI is hitting its limits and big data science is to the rescue to help us overcome these limits. We can learn a big deal from BI that big data science tool vendors and developers seem to overlook though.
Big data science tools and software are largely open source and free to use, though without any warranties. In contrast, most BI tools are commercially backed by big vendors that license their software for fees that are regarded as expensive by many. This however, is not a bad thing as such. The current landscape of BI tooling is such that actual analysis – be it dashboarding, modeling or other forms of gaining insights from data – is accessible even to business users. In reality, I have even experienced projects where a Chief Marketing Officer was using Tableau himself to base his strategy on. And he was able to do this correctly too. This is largely due to the fact that the commercial vendors focus on having businesses adopt their technology at large.
Big Data Science’s Flaw
The reason why BI got adopted by so many businesses is partly because BI has matured into a field that can be understood by a range of people far broader than the technical experts once controlling the domain. Business users that understand the application of their own data analyses well, but lack the skills required to dive into technical aspects of getting to the result of such a data analysis, are still capable of going from input to output due to the ease of use of BI tooling. This exact aspect is lacking in a big way for big data science.
As mentioned, big data science tools and technology is first and foremost driven by open source communities, recently being backed more and more by commercial third-parties. Tooling developed in open source settings focus mainly on functionality and getting things working and right in the first place, which is a sensible thing to do. The downside of this is that user experience and ease of use is not first priority. The result of this is that while big data science technology possesses huge potential, it’s secrets to unlocking remain with the few that are capable of using it.
Separating Implementation from Logic
Dealing with implementation details – a technical party more than anything – means that as a big data scientist, you are not able to focus on the thing that is most important to the job you are performing; getting the logic right. Why would we want to be bothered with writing a map and a reduce function? All that matters is that things get done the way we want it to – preferably in a decently fast way.
A key element to the success of getting rid of constantly dealing with implementation details, is to separate it from the actual logic. This is exactly what BI tools like Tableau and QlikView are doing and what should be done in a big data science setting too – the field is mature enough for it by now. Allowing business users and decision makers to truly understand what is happening in a data analysis, or even better; give them the tools to define the logic by themselves is what will make big data science available to the masses rather than to a small audience of technical experts.
Providing the necessary functionality to business users in a platform that hides the technical details in such a way that big data science aspects like machine learning and concurrent processing become readily available is not easy though.
At UnderstandLing, we focus on making often non-technical decision makers aware and understand analyses on their own data. This creates more vivid discussions and makes business cases succeed more easily because they are understood. Our technology focuses on ease of use, time to action and understandability, allowing for the rapid prototyping that decision makers are demanding from big data science. Not a single line of actual code has to be written to apply complex concepts such a predictive modeling – all the way to deep learning – and integrating your own internal data sources with new, opportunity-rich, external data sources in a matter of seconds.