Welcome!

@DXWorldExpo Authors: Pat Romanski, Liz McMillan, Roger Strukhoff, Elizabeth White, Mark Herring

Related Topics: @DXWorldExpo, @CloudExpo, Apache

@DXWorldExpo: Article

The 'Big' Fallacy of Big Data | @BigDataExpo #BigData

Why companies are luring you into the Big Data Trap

Unless you've been living under a rock for the past couple of years, you've been hearing about the world of Big Data nonstop. Big Data promises fortune and power to those that can wield the somewhat mystical and often nebulous power of "Big Data". Unfortunately for the rest of us mere mortals Big Data is built on an out-right lie that is both pernicious and unfortunate. It's hiding right there in plain sight in the name itself. The word, BIG.

The Fallacy of Big Data is that you have to have a lot of data for it to be relevant. The common catch phrase is: "More data = more insights". There is a nugget of truth to this in that, in some cases, a lot of data is needed in order to establish valid patterns and create real insight into the activity the data represents. More often than not however, this creates a significant challenge to those responsible for performing analytics which is sifting through a mountain of data to find the parts that actually matter. Recent studies have shown that fully 80% of data analysis is spent just tinkering with the data to get it into a usable format. So we see that more data creates a massive data curation issue, and leaves us with more work to do to even start experimenting, much less monetizing our data.

The reality of "Big Data" is that it was invented by those with no skin in the game. Analytics, open source, digital transformation, and Cloud are all of the technologies that enable comprehensive data analysis. With minimal infrastructure, commodity hardware, and free or nearly free software to store, analyze, and more importantly drive value from that data, the big infrastructure players are left out in the cold with nothing to offer. Enter "Big Data", because if you are going to try and manage petabytes of data you need good storage, and 10's of thousands of servers is awful to manage. So the Fallacy is born:

"In order to get real results from data, you cannot rely on just a little bit of it, or just the relevant data, you need every set of data imaginable. Therefore, (and here's where things get squidgy) you need to bring all that data in house (because the cloud is too expensive to store it) and you need a lot of manageable and flexible enterprise-grade gear to do it with (because free stuff is not enterprise ready)."

You can see how this is built around some nuggets of truth. I was asked recently, "how would you move a petabyte of data to Amazon cloud storage?" and I answered as truthfully as I could, "Very Slowly". Cloud does get expensive when used for a lot of infrastructure, but when used as a part of the overall solution it is an important tool. Also the thought of managing a massive Hadoop cluster of 1000 "exactly the same" servers sounds like the hell of IT in the pre-VM days, but it is also not really an accurate picture of the Hadoop landscape. The vast majority of analytics clusters top out around 50 servers and that's far more manageable (and less expensive) than huge enterprise gear. To be fair, there are organizations out there where a massive-scale, enterprise platformed approach will make sense, but the unfortunate side effect of this approach by legacy vendors is that they have made the solution itself the barrier to entry.

The problem is that now "Big Data" has made it into the vernacular and worse yet, has become synonymous with Data Analytics. Every company, organization, or even individual on earth can benefit from analyzing their relevant data for new insights. Take a very simple example; look at your budget to identify where you overspend (too many meals out for example). That is personal analytics, it does not require complex anything, and there are numerous ways to do it with free or nearly free tools. Now scale that up to the bank that wants to offer new digital, data-driven products to customers. They already have a lot of that data in house, and they already have a lot of analytical tools. Why would they need, per-se, to include every data set under the sun? They may want some more sets of data (social media to identify trends that might lead to investment opportunity), but they don't HAVE to have it stored in house to use it - it is all offered free-to-use via serialized API's. In the unique case where if they did decide to store it all in house, we are not talking about 10's of PB of data. More like adding a few 10's to 100's of TB for the data in question, because again - you don't download all of Twitter, just the stuff that is relevant to you. Also analytic data is largely transient data, meaning that it is used for the analysis and then discarded (especially true in the real-time world), so where is the need for massive infrastructure to support that initiative?

I have spoken a lot about "Big Data" and the Fallacy and trap of paying too much attention to the word BIG. Data is important to everyone and it can have value for anyone. In my most recent speaking sessions I have shown how you can do a simple social analysis for free in a matter of minutes. You don't need a massive infrastructure to make that production ready either. It just takes some willingness to see through the noise to the actual value of what the "Big Data" message is trying to say. Analytics is important and valuable for everyone. You don't have to be a Fortune 100 company to create value from the data you already have, and to bring in new data for analytics. Everyone can do it.

For more thought provoking content on Big Data and Data Analytics, click here.

Connect with  me on Twitter or LinkedIn and share your thoughts!

More Stories By Christopher Harrold

As an Agent of IT Transformation, I have over 20 years experience in the field. Started off as the IT Ops guy and followed the trends of the DevOps movement wherever I went. I want to shake up accepted ways of thinking and develop new models and designs that push the boundaries of technology and of the accepted status quo. There is no greater reward for me than seeing something that was once dismissed as "impossible" become the new normal, and I have been richly rewarded throughout my career with this result. In my last role as CTO at EMC Corporation, I was working tirelessly with a small group of engineers and product managers to build a market leading, innovative platform for data analytics. Combining best of breed storage, analytics and visualization solutions that enables the Data as a Service model for enterprise and mid sized companies globally.

@BigDataExpo Stories
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
Cloud Expo | DXWorld Expo have announced the conference tracks for Cloud Expo 2018. Cloud Expo will be held June 5-7, 2018, at the Javits Center in New York City, and November 6-8, 2018, at the Santa Clara Convention Center, Santa Clara, CA. Digital Transformation (DX) is a major focus with the introduction of DX Expo within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive ov...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, discussed the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, application p...
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve f...
Continuous Delivery makes it possible to exploit findings of cognitive psychology and neuroscience to increase the productivity and happiness of our teams. In his session at 22nd Cloud Expo | DXWorld Expo, Daniel Jones, CTO of EngineerBetter, will answer: How can we improve willpower and decrease technical debt? Is the present bias real? How can we turn it to our advantage? Can you increase a team’s effective IQ? How do DevOps & Product Teams increase empathy, and what impact does empath...
DevOps promotes continuous improvement through a culture of collaboration. But in real terms, how do you: Integrate activities across diverse teams and services? Make objective decisions with system-wide visibility? Use feedback loops to enable learning and improvement? With technology insights and real-world examples, in his general session at @DevOpsSummit, at 21st Cloud Expo, Andi Mann, Chief Technology Advocate at Splunk, explored how leading organizations use data-driven DevOps to close th...
As many know, the first generation of Cloud Management Platform (CMP) solutions were designed for managing virtual infrastructure (IaaS) and traditional applications. But that's no longer enough to satisfy evolving and complex business requirements. In his session at 21st Cloud Expo, Scott Davis, Embotics CTO, explored how next-generation CMPs ensure organizations can manage cloud-native and microservice-based application architectures, while also facilitating agile DevOps methodology. He expla...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
"Digital transformation - what we knew about it in the past has been redefined. Automation is going to play such a huge role in that because the culture, the technology, and the business operations are being shifted now," stated Brian Boeggeman, VP of Alliances & Partnerships at Ayehu, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone inn...
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
No hype cycles or predictions of a gazillion things here. IoT is here. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, an Associate Partner of Analytics, IoT & Cybersecurity at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He also discussed the evaluation of communication standards and IoT messaging protocols, data...
Recently, REAN Cloud built a digital concierge for a North Carolina hospital that had observed that most patient call button questions were repetitive. In addition, the paper-based process used to measure patient health metrics was laborious, not in real-time and sometimes error-prone. In their session at 21st Cloud Expo, Sean Finnerty, Executive Director, Practice Lead, Health Care & Life Science at REAN Cloud, and Dr. S.P.T. Krishnan, Principal Architect at REAN Cloud, discussed how they built...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...
SYS-CON Events announced today that Evatronix will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Evatronix SA offers comprehensive solutions in the design and implementation of electronic systems, in CAD / CAM deployment, and also is a designer and manufacturer of advanced 3D scanners for professional applications.
The “Digital Era” is forcing us to engage with new methods to build, operate and maintain applications. This transformation also implies an evolution to more and more intelligent applications to better engage with the customers, while creating significant market differentiators. In both cases, the cloud has become a key enabler to embrace this digital revolution. So, moving to the cloud is no longer the question; the new questions are HOW and WHEN. To make this equation even more complex, most ...