Welcome!

Big Data Journal Authors: Pat Romanski, David Smith, Greg Schulz, Shelly Palmer, Elizabeth White

Blog Feed Post

Hadoop Will Not Mow Your Lawn


"The best minds of my generation are thinking about how to make people click ads." Jeff Hammerbacher ex- Facebook Architect

It turns out that when you have a lot of "best minds" working on the same problem, you come up with some pretty interesting technology - no matter how inane that problem may be.

The technology that those "best minds" at Yahoo came up with to target ads to users is called Hadoop. 

Hadoop is a powerful technology and like most new IT solutions is being touted at being able to solve a vast number of technical ills. When companies discover that Hadoop will not in fact cure male pattern balding, they will fall into the inevitable trough of disillusionment

Here are some thoughts about what Hadoop can and cannot do:

1. RDBS are for business data, Hadoop is for web data

Almost all traditional business data fits well into the relational model, including data about customers (CRM), products (ERP) and employees (HR). This data should continue to live in relational databases, where it is much easier to manage and access than in Hadoop.

Almost all web data fits well into the Hadoop model, including log files, email and social media. This data would be almost impossible to store in a relational database, not just because of the volume, but because of the inherently nested quality of the data (threaded email conversations, web site directory structures, social media graphs).

2. Hadoop is really good at analyzing web data

Hadoop is incredibly good at looking at huge amounts of web data and figuring out why people clicked on the blue button instead of the red one. This can be generated to a few other computer log formats, but the list is relatively small, including:
How many other data types look like click streams? Not very many. How many other real world problems lend themselves to analysis using web data analytic techniques? Also not as many as you might think.

This is not to take anything from the Hadoop market opportunity - as more of the world interacts with each other via web applications and devices, more of the world's data will be reducible to click-stream-like formats. 

The big data craze has taken over the tech media world much like the cloud craze. Most people know it is important but they don't know why. Many vendors get caught up in the hype cycle and start to believe that their technology has some sort of manifest destiny that will allow it to do much more than it can reasonably be expected to do.

3. Hadoop is a Pay Me Later Technology

Traditional data warehouses work on a "pay me now" basis. To get data into the data warehouse - even data that may not end up being useful in any way - you have to massage the data into a formal relational model. This is expensive and the data normalization process itself may make it impossible to get at the data in exactly the way you want to.

In contrast, Hadoop works on a "pay me later" basis. Data can be shoved into the Hadoop file system any old way. It is not until someone wants to analyze the data that you have to worry about how to connect all the pieces. The gotcha is that the price you pay in this "pay me later" model is much higher, requiring extensive programming in order to ask each question. 

In addition, because the normalization process wasn't done up front, it won't be until later that you may discover that you were missing crucial pieces of information all along. Thus it does bear some thinking up front on what sort of data to store in your Hadoop database and what kinds of questions you might want to be able to answer about that data in the future.  

Realistically, it will take most businesses who implement several years to figure out whether all the data they are dumping into Hadoop produces real value out the back end, just as it was several years before companies started to get a payout from their investments in relational data warehouses.

4. Use the right tool for the right job

Back in my - very brief - high school shop days, we learned that the trick to making a really nice looking ash tray is picking the right tool for the right job.
  • Hadoop is web data query engine that requires a high level of effort for each new query. 
  • Relational is a business data query engine that requires a high level of effort to format and load data into the datastore.
The fastest way for companies to get into trouble with Hadoop is to try to use it as a one-size-fits-all data warehouse. Much of the news in the Hadoop world today has to do with SQL parsers that run on top of Hadoop data. This is a powerful and valuable technology, but does not mean that you can throw out your data warehouse and replace it with Hadoop just yet.



Read the original blog entry...

More Stories By Christopher Keene

Christopher Keene is Chairman and CEO of WaveMaker (formerly ActiveGrid). Chris was the founder, in 1991, of Persistence Software, a San Mateo, CA-based company that created a new approach for managing data in high-transaction banking and communications systems. Persistence Software investors included Cisco, Intel, Reuters and Sun Microsystems. The company went public in 1999 on the NASDAQ exchange and was sold in 2004 to Progress software.After leaving Persistence Software in 2005, Chris spent a year in France as chairman of Reportive Software, a Paris-based maker of business-intelligence tools, and as an adjunct professor and entrepreneur-in-residence at INSEAD, a leading graduate business school.

Cloud Expo Breaking News
Cloud computing is transforming the way businesses think about and leverage technology. As a result, the general understanding of cloud computing has come a long way in a short time. However, there are still many misconceptions about what cloud computing is and what it can do for businesses that adopt this game-changing computing model. In his General Session at the 12th International Cloud Expo, Gene Eun, Senior Director, Oracle Cloud at Oracle, will discuss and dispel some of the common myth...
SYS-CON Events announced today that Wowrack will exhibit at SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York. Wowrack’s core expertise lies in high-availability Private and Public Cloud IaaS Hosting Solutions. Wowrack provides a true Hybrid service – where business release all IT management and hardware provisioning – taking the data center and server system administrative headaches off our customer’s shoulders. ...
SYS-CON Events announced today that nfina Technologies, a provider of highly reliable cloud server products, will exhibit at SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York. nfina Technologies develops, manufactures, and markets highly reliable cloud server products, designed to solve the most demanding data center requirements in mission-critical cloud applications. Nfina’s staff has decades of experience in co...
SYS-CON Events announced today that OpenStack will exhibit at SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York. OpenStack software controls large pools of compute, storage, and networking resources throughout a datacenter, all managed by a dashboard that gives administrators control while empowering their users to provision resources through a web interface. OpenStack powers some of the most widely-used SaaS app...
“Cloud has everything to do with what has happened with Big Data,” explained Jason Deck, Director of Strategic Alliances at Logicworks, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “Big Data doesn’t exist in its easily accessible way without cloud. From reduced startup costs, to cheap storage, to fast processing, to adequate security, to the easy incorporation of third-party analytics tools, cloud made Big Data accessible to customers of all sizes, with all different bud...
As enterprises deploy private IaaS clouds into production they are reevaluating their future application delivery models. SUSE and WSO2 believe that private PaaS will leverage the automation and scalability of Private IaaS solutions, such as OpenStack-based SUSE Cloud, to deliver the secure, standardized development environments that will make migrating to an agile, serviceoriented delivery model possible. In their session at the 12th International Cloud Expo, Chris Haddad, VP of Technology Ev...
Organizations across the world are increasingly starting to see the benefits of moving more and more services to the cloud. The focus on the cost-saving potential of cloud is rapidly shifting to completely transforming the business with cloud. As organizations are investing enormous sums on technology they are starting to realize that in order to maximize the return on investment and accelerate the business transformation process the first area of focus should be people. By ensuring the organiza...
"Since Cloud Expo is running the week of June 10, we thought it'd be a great idea to schedule our Meetup this week. That way, if you have colleagues, friends, or family in town that week for the Expo, you can invite them to join you!" With those words, the OpenStack New York Meetup Group's organizer's launched a landing page this week where anyone interested can register for the June 12 evening event.
“Open source has always provided a number of benefits, including easing adoption costs, propagating a better understanding of the technology, and allowing for faster evolution and commercialization of products and services based on it,” noted Terry Woloszyn, Founder & CEO, Leeward Security Ltd., in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “This is clearly evident with the OpenStack and CloudStack,” Woloszyn continued, “and others that have been quickly commercialized as...
Cloud enables SMBs to access new, scalable resources – previously only available to enterprises – in flexible and cost-effective ways. McKinsey’s SMB Cloud Report projects the public cloud market to reach $40-$50 billion by 2015, with SMBs comprising 65% of public cloud spending in 2015. But selling cloud to SMBs raises the questions of who, what and how. In her session at the 12th International Cloud Expo, Manjula Talreja, VP of Cisco’s Global Cloud Business Development Team, will discuss the...