Welcome!

@BigDataExpo Authors: Yeshim Deniz, Christoph Schell, Liz McMillan, Elizabeth White, Matt Brickey

Related Topics: @BigDataExpo, @CloudExpo, Apache

@BigDataExpo: Article

The 'Big' Fallacy of Big Data | @BigDataExpo #BigData

Why companies are luring you into the Big Data Trap

Unless you've been living under a rock for the past couple of years, you've been hearing about the world of Big Data nonstop. Big Data promises fortune and power to those that can wield the somewhat mystical and often nebulous power of "Big Data". Unfortunately for the rest of us mere mortals Big Data is built on an out-right lie that is both pernicious and unfortunate. It's hiding right there in plain sight in the name itself. The word, BIG.

The Fallacy of Big Data is that you have to have a lot of data for it to be relevant. The common catch phrase is: "More data = more insights". There is a nugget of truth to this in that, in some cases, a lot of data is needed in order to establish valid patterns and create real insight into the activity the data represents. More often than not however, this creates a significant challenge to those responsible for performing analytics which is sifting through a mountain of data to find the parts that actually matter. Recent studies have shown that fully 80% of data analysis is spent just tinkering with the data to get it into a usable format. So we see that more data creates a massive data curation issue, and leaves us with more work to do to even start experimenting, much less monetizing our data.

The reality of "Big Data" is that it was invented by those with no skin in the game. Analytics, open source, digital transformation, and Cloud are all of the technologies that enable comprehensive data analysis. With minimal infrastructure, commodity hardware, and free or nearly free software to store, analyze, and more importantly drive value from that data, the big infrastructure players are left out in the cold with nothing to offer. Enter "Big Data", because if you are going to try and manage petabytes of data you need good storage, and 10's of thousands of servers is awful to manage. So the Fallacy is born:

"In order to get real results from data, you cannot rely on just a little bit of it, or just the relevant data, you need every set of data imaginable. Therefore, (and here's where things get squidgy) you need to bring all that data in house (because the cloud is too expensive to store it) and you need a lot of manageable and flexible enterprise-grade gear to do it with (because free stuff is not enterprise ready)."

You can see how this is built around some nuggets of truth. I was asked recently, "how would you move a petabyte of data to Amazon cloud storage?" and I answered as truthfully as I could, "Very Slowly". Cloud does get expensive when used for a lot of infrastructure, but when used as a part of the overall solution it is an important tool. Also the thought of managing a massive Hadoop cluster of 1000 "exactly the same" servers sounds like the hell of IT in the pre-VM days, but it is also not really an accurate picture of the Hadoop landscape. The vast majority of analytics clusters top out around 50 servers and that's far more manageable (and less expensive) than huge enterprise gear. To be fair, there are organizations out there where a massive-scale, enterprise platformed approach will make sense, but the unfortunate side effect of this approach by legacy vendors is that they have made the solution itself the barrier to entry.

The problem is that now "Big Data" has made it into the vernacular and worse yet, has become synonymous with Data Analytics. Every company, organization, or even individual on earth can benefit from analyzing their relevant data for new insights. Take a very simple example; look at your budget to identify where you overspend (too many meals out for example). That is personal analytics, it does not require complex anything, and there are numerous ways to do it with free or nearly free tools. Now scale that up to the bank that wants to offer new digital, data-driven products to customers. They already have a lot of that data in house, and they already have a lot of analytical tools. Why would they need, per-se, to include every data set under the sun? They may want some more sets of data (social media to identify trends that might lead to investment opportunity), but they don't HAVE to have it stored in house to use it - it is all offered free-to-use via serialized API's. In the unique case where if they did decide to store it all in house, we are not talking about 10's of PB of data. More like adding a few 10's to 100's of TB for the data in question, because again - you don't download all of Twitter, just the stuff that is relevant to you. Also analytic data is largely transient data, meaning that it is used for the analysis and then discarded (especially true in the real-time world), so where is the need for massive infrastructure to support that initiative?

I have spoken a lot about "Big Data" and the Fallacy and trap of paying too much attention to the word BIG. Data is important to everyone and it can have value for anyone. In my most recent speaking sessions I have shown how you can do a simple social analysis for free in a matter of minutes. You don't need a massive infrastructure to make that production ready either. It just takes some willingness to see through the noise to the actual value of what the "Big Data" message is trying to say. Analytics is important and valuable for everyone. You don't have to be a Fortune 100 company to create value from the data you already have, and to bring in new data for analytics. Everyone can do it.

For more thought provoking content on Big Data and Data Analytics, click here.

Connect with  me on Twitter or LinkedIn and share your thoughts!

More Stories By Christopher Harrold

As an Agent of IT Transformation, I have over 20 years experience in the field. Started off as the IT Ops guy and followed the trends of the DevOps movement wherever I went. I want to shake up accepted ways of thinking and develop new models and designs that push the boundaries of technology and of the accepted status quo. There is no greater reward for me than seeing something that was once dismissed as "impossible" become the new normal, and I have been richly rewarded throughout my career with this result. In my last role as CTO at EMC Corporation, I was working tirelessly with a small group of engineers and product managers to build a market leading, innovative platform for data analytics. Combining best of breed storage, analytics and visualization solutions that enables the Data as a Service model for enterprise and mid sized companies globally.

@BigDataExpo Stories
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big...
SYS-CON Events announced today that Calligo, an innovative cloud service provider offering mid-sized companies the highest levels of data privacy and security, has been named "Bronze Sponsor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Calligo offers unparalleled application performance guarantees, commercial flexibility and a personalised support service from its globally located cloud plat...
"We focus on SAP workloads because they are among the most powerful but somewhat challenging workloads out there to take into public cloud," explained Swen Conrad, CEO of Ocean9, Inc., in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"Outscale was founded in 2010, is based in France, is a strategic partner to Dassault Systémes and has done quite a bit of work with divisions of Dassault," explained Jackie Funk, Digital Marketing exec at Outscale, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are still a relatively small software house and we are focusing on certain industries like FinTech, med tech, energy and utilities. We help our customers with their digital transformation," noted Piotr Stawinski, Founder and CEO of EARP Integration, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We've been engaging with a lot of customers including Panasonic, we've been involved with Cisco and now we're working with the U.S. government - the Department of Homeland Security," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"With Digital Experience Monitoring what used to be a simple visit to a web page has exploded into app on phones, data from social media feeds, competitive benchmarking - these are all components that are only available because of some type of digital asset," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We want to show that our solution is far less expensive with a much better total cost of ownership so we announced several key features. One is called geo-distributed erasure coding, another is support for KVM and we introduced a new capability called Multi-Part," explained Tim Desai, Senior Product Marketing Manager at Hitachi Data Systems, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We provide IoT solutions. We provide the most compatible solutions for many applications. Our solutions are industry agnostic and also protocol agnostic," explained Richard Han, Head of Sales and Marketing and Engineering at Systena America, in this SYS-CON.tv interview at @ThingsExpo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"Peak 10 is a hybrid infrastructure provider across the nation. We are in the thick of things when it comes to hybrid IT," explained Michael Fuhrman, Chief Technology Officer at Peak 10, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We were founded in 2003 and the way we were founded was about good backup and good disaster recovery for our clients, and for the last 20 years we've been pretty consistent with that," noted Marc Malafronte, Territory Manager at StorageCraft, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
"The Striim platform is a full end-to-end streaming integration and analytics platform that is middleware that covers a lot of different use cases," explained Steve Wilkes, Founder and CTO at Striim, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are focused on SAP running in the clouds, to make this super easy because we believe in the tremendous value of those powerful worlds - SAP and the cloud," explained Frank Stienhans, CTO of Ocean9, Inc., in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
DX World EXPO, LLC., a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Me...
"MobiDev is a Ukraine-based software development company. We do mobile development, and we're specialists in that. But we do full stack software development for entrepreneurs, for emerging companies, and for enterprise ventures," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"Cloud computing is certainly changing how people consume storage, how they use it, and what they use it for. It's also making people rethink how they architect their environment," stated Brad Winett, Senior Technologist for DDN Storage, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
While the focus and objectives of IoT initiatives are many and diverse, they all share a few common attributes, and one of those is the network. Commonly, that network includes the Internet, over which there isn't any real control for performance and availability. Or is there? The current state of the art for Big Data analytics, as applied to network telemetry, offers new opportunities for improving and assuring operational integrity. In his session at @ThingsExpo, Jim Frey, Vice President of S...