AtScale Blog

Schema On Demand: Using Hadoop In Real-Time

Posted by on
Find me on:

MIC-ChartData_05-01An important question companies often have and need to address when it comes to their business software is this: “Can our software easily adapt and work with any type of data it could possibly encounter?”  After all, you cannot always predict what kinds of data you and your company may come across, and it is unrealistic to assume that your software will always encounter the same kinds of data over and over. Unprepared companies that do not have adaptable, intuitive software often run into problems as customers and clients attempt to submit information forms on these companies’ websites. If the software is unable to process an unexpected form of data, this can result in several errors and time-consuming fixes--and that will just be in response to one type of data. The whole process will need to be repeated if and when another unrecognized data type makes its way into the software.

Unstructured and Unlimited

Fortunately, this annoyance and frustration can be avoided when you use tools such as Hadoop and AtScale. Hadoop is schema-free, which means that it can easily handle structured and unstructured data. When you combine Hadoop’s abilities with AtScale, all of your customers’ and clients’ data will be very simple to manage, process, and analyze.

Let’s take a look at one company that uses Hadoop and AtScale. Each time one of their users logs onto or buys anything from their website, an “event” is generated. Each event contains data such as a timestamp, a user ID, and a key-value pair that acts as a unique identifier for that particular set of data, to name a few examples. The way this client is using this software is highly flexible: when more events are created, the schema is not altered. New rows of data are simply added without any complications or changes to previous data. Additionally, they can edit the events’ properties or even add new properties, and they will become additional key-value pairs--all without changing the schema.

Real-Time Queries

Hadoop is able to easily and quickly take all of this available data and map any specific attribute you want from an event. It allows you to immediately query any new data as it arrives, without having to export the data into a separate table or break the data down any further.

All of this is extremely beneficial because using Hadoop with AtScale makes dealing with complex data types much simpler. In other query tools, exported data is not very user-friendly. Hadoop and AtScale allow you to easily list and order data according to whatever attribute you like by pulling the data and organizing it in very simple, easy-to-use graphs and charts. This is done by selecting which data and which attributes you would like to be able to export, making them queryable and allowing you to analyze the data according to your company’s needs.

OLAP for Quick Processing

Hadoop and AtScale are a more effective tool than other options, such as multidimensional online analytical processing (OLAP) or in-memory tools. An OLAP system, for example, calculates every possible combination and exports it all to disc. However, if you are using an OLAP for Hadoop and would like to process something such as a cookie or a card ID that has several dimensions and several numbers in each dimension, the possible combinations are exponential and thus have too many combinations to be exported and managed. Similarly, when you use in-memory tools, you cannot process nearly as much data as you can with Hadoop and AtScale because you’re limited to the memory capacity of your equipment. In order to store it all in memory, you’ll have to significantly edit and compress your data. Using Hadoop and AtScale together, on the other hand, allows you to keep all of your original data and process it easily.

New Call-to-action

Topics: Hadoop

Learn about BI & Hadoop

The AtScale Blog is the one-stop shop for cutting edge news and insights about BI on Hadoop and all things AtScale.

Subscribe to Email Updates