AtScale Blog

AtScale Delivers the Industry’s First Modern Business Intelligence Platform, enables BI on Hadoop and Big Data, On Premise and Cloud

Posted by AtScale on Nov 17, 2016

Big Data analytics leader previews industry’s first platform to enable unified business intelligence for Teradata, Hadoop, Google Dataproc and BigQuery

San Mateo, CA, November 17, 2016 – AtScale, the first company to provide enterprises with a fast and secure self-service BI platform for Big Data, today announced a significant expansion of its services, from BI on Hadoop to BI on Big Data.

Read More

Topics: Hadoop, About AtScale, Tableau, Business Intelligence, spark, Big Data

TECH TALK:  BI-on-Hadoop Engine Wars Continue...Everybody Wins

Posted by Joshua Klahr on Oct 18, 2016

Just this week, AtScale published the Q4 Edition of our BI-on-Hadoop Benchmark, and we found 1.5X to 4X performance improvements across SQL engines Hive, Spark, Impala and Presto for Business Intelligence and Analytic workloads on Hadoop.

Bottom line, the benchmark results are great news for any company looking to analyze their big data in Hadoop because you can now do so faster, on more data, for more users than ever before.

While this blog provides a high level summary of our findings, you can access the full Q4 2016 Edition of the BI-on-Hadoop Benchmarks here, and also listen to our webinar replay discussing this in more details here.

Read More

Topics: Hadoop, Business Intelligence, spark, hive, bi-on-hadoop, Big Data, impala, presto

How Much Dough is in Your Data?!

Posted by Bruno Aziza on Sep 19, 2016
 

This morning, O'Reilly Media published the results of its 2016 Data Science Salary Survey.  The report covers a wide set of topics such as salary differences by gender and countries as well as details for the types of skills that can give employees an edge when it comes to earnings.  We tooked a closer look at the Business Intelligence answers and what we found out might surprise you...

Read More

Topics: Hadoop, Tableau, Business Intelligence, spark, Big Data, excel, powerbi

What is Spark and Why You Should Care

Posted by Ashley Huang on Jun 24, 2016
 

Rumor has it that with the rise of Apache Spark, Spark will replace Hadoop.

Wait What? 

Well, let’s take a look. Apache Spark is an open-source processing engine that supports interactive quieries while Hadoop is an easy to scale, cost effective data storage. The truth is- Spark does not replace Hadoop, in fact, Hadoop and Spark complement one another.

Now you may wonder: how will Spark and Hadoop affect your big data strategy?

Read More

Topics: Hadoop, Business Intelligence, spark, Big Data

TECH TALK: Scale-Out Business Intelligence with Hadoop

Posted by Joshua Klahr on Apr 21, 2016

The growing popularity of big data analytics coupled with the adoption of technologies like Spark and Hadoop have allowed enterprises to collect an ever increasing amount of data - in terms of breadth and volume.  At the same time, the need for traditional business analysis of these data sets using widely adopted tools like Microsoft Excel, Tableau, and Qlik still remains.  Historically data is provided to these visualization front ends using OLAP interfaces and data structures. OLAP makes the data easy for business users to consume, and offers interactive performance for the types of queries that the business intelligence (BI) tools generate. 

However, as data volumes explode, reaching hundreds of terabytes or even petabytes of data, traditional OLAP servers have a hard time scaling.  To surmount this modern data challenge, many leading enterprises are now in search of the next generation of business intelligence capabilities, falling into the category of scale-out BI.  In this blog I'll  share how you can leverage the familiar interface and performance of an OLAP server while scaling out to the largest of data sets. 

And if you don't have time to read the whole thing, don't miss the 10-minute 'cliff-note' video of scale-out BI on Hadoop near the end.

 

Read More

Topics: Hadoop, Business Intelligence, spark, hive, bi-on-hadoop, Big Data

TECH TALK:  First-Child & Last-Child Measures in Hadoop

Posted by Joshua Klahr on Mar 24, 2016

As more and more enterprises adopt Hadoop as their next generation data platform, the demands of traditional enterprise workloads, including support for Business Intelligence use cases, are creating challenges.  While Hadoop excels at low-cost distributed storage and parallel data processing, interactive support for BI-style queries remains a challenge.  Additionally, multi-dimensional queries often demand complex OLAP-style calculations and functions.  In this post we will share how AtScale helps to bridge the gap between Business Intelligence users and data that resides in Hadoop.

In many typical business analyses or applications it is important to be able to directly query the first or last value of a particular metric across a hierarchy.  For example:

  • What was the starting or ending price of a security during a particular day
  • What were inventory levels for a SKU at the beginning and end of the month
  • What was the first and last payment amount for a loan agreement

Not Always as Easy as it Sounds

Executing such a query using SQL may involve complex queries consisting of unions, sub-queries, and/or temporary tables.  In MDX (Multidimensional Expression Language) such a query is easier to support, given MDX’s rich support for analytical queries and hierarchical representation.  AtScale has implemented support for First Child and Last Child measures in a way that supports BOTH SQL and MDX clients, which means that virtually any data visualization client can take advantage of this advanced functionality.

Read More

Topics: Hadoop, Business Intelligence, spark, hive, bi-on-hadoop, Big Data

TECH TALK:  SQL-on-Hadoop Benchmark: A Bit of a Tortoise and Hare Story

Posted by Trystan Leftwich on Feb 24, 2016

Trystan here, Software Engineer and doer of all things technical at AtScale.  Which SQL-on-Hadoop engine performs best?  We get this question all the time!

We looked around and found that no one had done a complete and impartial benchmark test of real-life workloads across multiple SQL-on-Hadoop engines (Impala, Spark, Hive...etc).

So, we decided to put our enterprise experience to work and deliver the world's first BI-on-Hadoop performance benchmark.  

What did we find out?  Well, turns out that the right question to ask is: "Which engine performs best for Which query type?".  We looked across three of the most common types of BI queries and found that each engine had a particular niche.  Bottom line: One Engine does NOT fit all.

Read on to find out the details of our environment and configuration, the types of queries we tested... (or download the full whitepaper here)

Read More

Topics: Hadoop, Business Intelligence, spark, hive, bi-on-hadoop, Big Data, impala

Spark Summit: 5 Things You Should Know

Posted by Bruno Aziza on Feb 13, 2016

Spark Summit 2016 kicks off next week in NYC and thousands are expected to attend the event, whose theme this year is "Data Science and Engineering At Scale". Great companies will be presenting - from Comcast to Thales and Viacom...

Read More

Topics: Hadoop, spark

Has Spark Killed Hadoop?!

Posted by Bruno Aziza on Dec 11, 2015

One of the "buzziest" subjects of conversation in the Big Data scene this past year has been Spark.  The powerful open source processing engine developed in 2009 has gotten some great traction and many have covered the stories of community adoption and growth.  

As one would expect though, some of the buzz spun out of control, so much, some went on to write that "Hadoop was dead" or that "Spark would eventually replace Hadoop".  

Such reports are misinformed, overly sensational and inaccurate.  If you have a few minutes, I propose you watch the below 2 minute-video on the key role that Spark plays and its relationship with Hadoop.  In this interview, we talk about security, speed and other key items that make Hadoop relevant to business users.  

For a deeper understanding on the topic, check out the great piece that our VP of Product Management, Josh Klahr, authored for ReadWrite last week.  My favorite passage is below (the full piece is here):

"We need to stop playing Spark and Hadoop off each other and understand how they will coexist.  Hadoop will continue to be used as a platform for scale-out data storage, parallel processing, and clustered workload management.  Spark will continue to be used for both batch-oriented and interactive scale-out data-processing needs."

Read More

Topics: Hadoop, Tableau, Business Intelligence, spark

Learn about BI & Hadoop

The AtScale Blog is the one-stop shop for cutting edge news and insights about BI on Hadoop and all things AtScale.

Subscribe to Email Updates