Every once in awhile, the ultimate question comes up: "What is the best analysis tool for BI on Hadoop?!". AtScale is not in the business of favoring one tool versus the other. We are in the business of making all of them work. There are indeed many reasons why business users and I.T. departments choose particular analysis tools. Here are a few things to consider.
Yes, there are actually ways to 'Do Big Data Analytics Right'.
Leaders and innovators in the Big Data space have learned the hard way, and now those of you looking to dip your toe, or jump head first, into the BI on Big Data waters can capitalize on their early experiences. Let go of the fear or ego or whatever may be holding you back and take the chance to learn from those who took the early adopter risk.
Is your Big Data ‘mature’? You may be puzzled by this question, since many in the industry have been saying ‘Big Data is Dead’ for years. But Big Data is far from dead, and instead technologies and solutions that make up the Big Data space are maturing at an ever increasing rate. From traditional players like Teradata, to open-source Hadoop, to new Cloud players like Google Big Query, the Big Data space is doing more to help companies manage and gain insights from their exploding and morphing data than at any other point in history. So what?
Congratulations! Your Hadoop cluster is up and running. Your data feeds work; your team knows how to manage the cluster, and expert users mine the data with Hive, Pig, Spark. But your executives aren’t satisfied. “Where is the business value?” they ask. “Why don’t we see more people using Hadoop?”
Just this week, AtScale published the Q4 Edition of our BI-on-Hadoop Benchmark, and we found 1.5X to 4X performance improvements across SQL engines Hive, Spark, Impala and Presto for Business Intelligence and Analytic workloads on Hadoop.
Bottom line, the benchmark results are great news for any company looking to analyze their big data in Hadoop because you can now do so faster, on more data, for more users than ever before.
While this blog provides a high level summary of our findings, you can access the full Q4 2016 Edition of the BI-on-Hadoop Benchmarks here, and also listen to our webinar replay discussing this in more details here.
The growing popularity of big data analytics coupled with the adoption of technologies like Spark and Hadoop have allowed enterprises to collect an ever increasing amount of data - in terms of breadth and volume. At the same time, the need for traditional business analysis of these data sets using widely adopted tools like Microsoft Excel, Tableau, and Qlik still remains. Historically data is provided to these visualization front ends using OLAP interfaces and data structures. OLAP makes the data easy for business users to consume, and offers interactive performance for the types of queries that the business intelligence (BI) tools generate.
However, as data volumes explode, reaching hundreds of terabytes or even petabytes of data, traditional OLAP servers have a hard time scaling. To surmount this modern data challenge, many leading enterprises are now in search of the next generation of business intelligence capabilities, falling into the category of scale-out BI. In this blog I'll share how you can leverage the familiar interface and performance of an OLAP server while scaling out to the largest of data sets.
And if you don't have time to read the whole thing, don't miss the 10-minute 'cliff-note' video of scale-out BI on Hadoop near the end.
As more and more enterprises adopt Hadoop as their next generation data platform, the demands of traditional enterprise workloads, including support for Business Intelligence use cases, are creating challenges. While Hadoop excels at low-cost distributed storage and parallel data processing, interactive support for BI-style queries remains a challenge. Additionally, multi-dimensional queries often demand complex OLAP-style calculations and functions. In this post we will share how AtScale helps to bridge the gap between Business Intelligence users and data that resides in Hadoop.
In many typical business analyses or applications it is important to be able to directly query the first or last value of a particular metric across a hierarchy. For example:
- What was the starting or ending price of a security during a particular day
- What were inventory levels for a SKU at the beginning and end of the month
- What was the first and last payment amount for a loan agreement
Not Always as Easy as it Sounds
Executing such a query using SQL may involve complex queries consisting of unions, sub-queries, and/or temporary tables. In MDX (Multidimensional Expression Language) such a query is easier to support, given MDX’s rich support for analytical queries and hierarchical representation. AtScale has implemented support for First Child and Last Child measures in a way that supports BOTH SQL and MDX clients, which means that virtually any data visualization client can take advantage of this advanced functionality.
Trystan here, Software Engineer and doer of all things technical at AtScale. Which SQL-on-Hadoop engine performs best? We get this question all the time!
We looked around and found that no one had done a complete and impartial benchmark test of real-life workloads across multiple SQL-on-Hadoop engines (Impala, Spark, Hive...etc).
So, we decided to put our enterprise experience to work and deliver the world's first BI-on-Hadoop performance benchmark.
What did we find out? Well, turns out that the right question to ask is: "Which engine performs best for Which query type?". We looked across three of the most common types of BI queries and found that each engine had a particular niche. Bottom line: One Engine does NOT fit all.
Read on to find out the details of our environment and configuration, the types of queries we tested... (or download the full whitepaper here)