Today’s tech-connected world is generating massive volumes of data, streaming in from a variety of channels at lightening speed. While traditional approaches worked well in the past for analyzing large sets of structured data that could fit neatly into rows and columns, much of today’s data is raw, unstructured and just plain messy. This disparate data is also extremely valuable data, as it holds rich insights that today’s businesses can use to boost profitability and gain competitive advantage.
Capturing, storing, analyzing, and extracting insights from data of massively large scale and variety calls for a solution beyond the limitations of the traditional relational database. That’s where Hadoop enters the picture. What follows is a look at a number of key areas where big data analytics with Hadoop is a major improvement upon traditional approaches.
Traditional analytics solutions call for traditional data. However, in order to put large volumes of non-traditional unstructured data into traditional form, it must be moved and copied. This is problematic, because the process of moving data---taking it out of its original form and putting it into another form so it can be queried---creates data complexities and redundancies.
For starters, moving data through extraction, transformation and load is an extremely complex process, requiring large numbers of highly skilled people. In addition, making multiple copies of data so that each department within an organization has its own vertical data stack---and its own data reality---puts the integrity of the data into question by fragmenting it instead of consolidating it. Other negative consequences of moving and copying data are stale data, rigid schema and an overall data infrastructure that is difficult to change.
In contrast, the Hadoop analytics platform is designed to handle both the scale and variety of big data. This means that Hadoop eliminates the need to move and copy data, thus allowing data to be queried in place, in its original raw and unstructured form.
Instead of fragmenting data into data marts, Hadoop consolidates the data into a centralized data warehouse or data lake. This allows analysts to take the data as it is generated by the application and store it in Hadoop without worrying about transforming it or making use of it later. With Hadoop, analysts are able to get away from having to pre-model or pre-aggregate the data before knowing if it is worthy to do so. Most importantly, the ability to materialize data at query time means true schema on read.
The dream of data warehousing was to be able to scale out infrastructure by buying hardware and not having to change any of the processes or software stacks that interact with that data. However, that dream was never realized by traditional relational databases. Limited by one server, these platforms can only scale vertically by adding additional proprietary hardware. This approach is often expensive and time consuming, making it prohibitive to all but large and well funded businesses.
Hadoop, on the other hand, allows the data infrastructure to scale horizontally or “out”. Hadoop data is clustered, which means that if you double data collection you simply double the number of Hadoop clusters and performance remains constant. Hadoop scales linearly as the data grows and more nodes are added, and that makes performance predictable. And being that Hadoop is non-proprietary, businesses can do the same data distribution for parallel access in an open source environment, and at a much lower cost. With cloud based Hadoop, businesses can spin up literally thousands of virtual servers in moments if needed, while paying only for the storage and compute power that they actually use.
As previously discussed, traditional databases are limited, in that they can only handle data that has been made to fit into a structured format. This constraint can prevent many of the rich business insights hidden within the original raw data from ever seeing the light of day. However, Hadoop is built to handle multiple data formats while preserving the data in its natural and messy form. With Hadoop, different data formats can be made to look the same to the various applications that are plugged into Hadoop. This is a critical benefit, as it means that a business does not need to have big data to take advantage of Hadoop. For businesses that have a variety of data formats, Hadoop can provide significant value regardless of the scale out.
Simplicity for business users
The true measure of a big data analytics platform for business lies in its ability to provide real value. In the case of traditional databases, that value is often not fully realized due to complexities that lie in the way of business users. After all, valuable business information lies trapped in multiple silos and is further obscured from business leaders because they lack advanced degrees in data science and have a difficult time deriving meaning through direct access of the data structure.
In answer to the rapid growth and proliferation of data, Hadoop allows business users to interact with data in a business way. Business users want to be able to consume data the way they want to, in its natural form, without having to worry about how it got there or how it was made. Hadoop provides a single semantic layer that allows business users to utilize powerful business intelligence tools such as Excel or Tableau to see data in ways they can understand intuitively. Armed with valuable insights, business users can make data driven decisions that will help their organizations become more profitable and competitive.