AtScale Blog

BI Tools: The Do’s and Don’t’s of Integrating Hadoop

Posted by Bruno Aziza on Jul 21, 2015
Find me on:


Big data can translate to big wins for your company, but making it work means working smarter. Hadoop makes it simple to distribute storage and process very large data sets. Make Hadoop work for you even further by pairing it with your BI tools like Tableau, Excel and Qlik. Read on to understand best practices of BI on Hadoop with the following do’s and don’t’s.


1. Move & copy data

Moving and copying data has been necessary since the beginning of data warehousing because data had to be put into a form that was acquariable. With the advent of big data tools like Hadoop, and with companies like Atscale, data movement is no longer necessary, so the side effects like data redundancies no longer occur.

Instead of moving and copying data, query the data in one place with Hadoop. Rather than writing data three times, you should be able to write it once.

2. Have multiple definitions of reality

We started with wanting to have a single data warehouse to house big data. We’ve ended up with a series of data marts with unconsolidated data. Each department/function builds their own vertical stack, so data integrity is put into question and the management of that data is distributed to the different lines of business. This process is difficult and costly to manage. Business users are defining their own definitions of reality, and that causes trouble, calling into question the integrity of the data.

Instead of having multiple definitions of reality, use Hadoop to create a single semantic layer. Hadoop can act as a centralized enterprise data warehouse, or a data lake. You don’t have to worry about pre-formatting data; you just store it. Worry about transforming it or making use of it later.

3. Scale up with proprietary hardware

Scaling up with limit you to a single machine or server.

Instead of scaling up, scale out with your Hadoop cluster. With Hadoop, all data, regardless of form or function, works in a single data architecture. This allows you to horizontally scale the data infrastructure by buying hardware and not changing any processes. Performance stays constant.

4. Do relational schemas

Relational schemas are limiting. Fitting business semantics into rows and columns is getting more difficult to do with the proliferation of data and the data collecting that is being done today.

Instead of doing relational schemas, leverage Hadoop’s schema-on-demand.

5. Lock yourself into proprietary stacks

Be open in terms of the engines you’re using to access data and the methods being used to query that data. Hadoop breaks the proprietary lock and allows you to do the same data distribution but in an open-source environment and at open-source costs. Use open-source engines and any BI tools.

As a reminder:


1. Query your data in one place.

2. Create a single semantic layer.

3. Scale out, not up, with your Hadoop cluster.

4. Leverage Hadoop’s schema-on-demand.

5. Use open-source engines and any BI tool.

Find better insights into your big data analytics with a solid strategy. Hadoop delivers speed-of-thought performance so that you can understand where your big data is today and where it needs to be going tomorrow.

Watch the Do's and Don't's Video Here:

  New Call-to-action

Topics: Hadoop

Learn about BI & Hadoop

The AtScale Blog is the one-stop shop for cutting edge news and insights about BI on Hadoop and all things AtScale.

Subscribe to Email Updates