AtScale Blog

5 Things You May Not Know About Hadoop

Posted by Bruno Aziza on Jul 15, 2015
Find me on:

5Things

Since its unveiling in 2005, Hadoop has steadily become more prominent in the world of big data storage and retrieval. Hadoop got its name from an elephant toy owned by the inventor's son, and truthfully, it is not a stretch to see Hadoop as the "elephant" of all data processing and storage solutions, in that it is capable of storing and retrieving huge amounts of data in what seems like the blink of an eye.

However, this open source software platform has had its share of detractors as well, and part of the fallout from critics has included a number of myths or misconceptions that overshadow the key facts about Hadoop, its benefits and how it works. In this post, learn the top 5 key facts you don't know about Hadoop that can influence how you install and implement Hadoop's capabilities within your organization.

Thing #1: Hadoop is the future of big raw data.

Hadoop's infrastructure is designed to handle petabyte-level raw data storage and retrieval through a cluster storage system that makes storage both more scalable and much more affordable. And for this task, there is none better for the job than Hadoop.

However, Hadoop's usefulness for analyzing that raw data and organizing it into actionable analytics is still a manmade job that often requires additional tools or professional consulting services. Here, it is important for organizations to think of Hadoop as the foundation of better business intelligence analytics, not the be all-end all solution in and of itself.

Thing #2: Hadoop is designed to be an affordable solution for enterprise-level organizations with massive data.

At the enterprise level, organizations may have access to a tremendous amount of raw and semi-structured data that is chock-full of useful insights. While organizations of any size may find benefit from bi on Hadoop, its most enthusiastic proponents will likely be global companies with masses of data to sift through. Here, early adopters may get the jump on their competition, but only when there is enough knowledge to translate the raw data into usable, accurate BI analytics.

What is most important to know about Hadoop is that it makes storing vast amounts of data affordable on a scale that has never before been available. As well, for startups that anticipate rapid or massive growth, Hadoop offers a scalable architecture that can grow as quickly as the organization needs to grow, without increasing overhead costs commensurately.

Thing #3: Hadoop presents certain security challenges.

Because of how Hadoop stores and distributes raw data, there are newfound security challenges that may not be adequately addressed through traditional firewalls, prevention practices and intrusion alerts. As well, since not all the data is stored in one centralized archive with Hadoop, employees handling Hadoop data sets may need to be granted administrative access to high level data to avoid inadvertently violating company access protocols.

Thing #4: More vendors are developing turnkey solutions to support Hadoop.

Each day more vendors begin or launch solutions to help business intelligence professionals with Hadoop-based data retrieval, organization and analysis. For companies that lack an in-house Hadoop specialist, these tools can shorten the learning curve and make the investment in Hadoop show results much more quickly from a competitive standpoint. This is especially important since, at the moment, Hadoop data analysis is most successful within the framework of defined scenarios that factor in the organization's own underlying structure and operations.

As such, turnkey solutions can play their part in providing this underlying structure to speed up data analysis and application. These types of third-party solutions are also making it easier for companies to integrate Hadoop into an existing BI setup, although as of yet there is no one tool that offers seamless, complete integration at this time.

Thing #5: Hadoop itself is still undergoing a tremendous evolution.

The underlying Hadoop architecture is well tested for the purposes of handling massive data sets efficiently and affordably. However, there are still many Hadoop tools that are either in prototype stage or still undergoing applications testing. These tools are slowly but surely making Hadoop its own kind of turnkey system for capturing, organization and analyzing data, but it is not there yet.

In summary:

By understanding the facts about Hadoop and dispelling any myths or misconceptions, it becomes much easier to see the possibilities that Hadoop can offer and chart a successful course to integrating Hadoop with other existing systems. To read more, check out our Hadoop Data Sheet. 

New Call-to-action

Topics: Hadoop

Learn about BI & Hadoop

The AtScale Blog is the one-stop shop for cutting edge news and insights about BI on Hadoop and all things AtScale.

Subscribe to Email Updates