Welcome!

@DXWorldExpo Authors: Pat Romanski, Yeshim Deniz, Elizabeth White, Liz McMillan, William Schmarzo

Related Topics: @DXWorldExpo, Microservices Expo

@DXWorldExpo: Blog Feed Post

Big Data Redefined By @TonyShan | @CloudExpo [#BigData]

Big Data is a loose term for the collection, storage, processing, and sophisticated analysis of massive amounts of data

Big Data is a loose term for the collection, storage, processing, and sophisticated analysis of massive amounts of data, far larger and from many more kinds of sources than ever before. The definition of Big Data can be traced back to the 3Vs model defined by Doug Laney in 2001: Volume, Velocity, and Variety. The fourth V was later added in different fashions, such as “Value” or “Veracity”.

Interestingly the conceptualization of Big Data in the beginning of this century seems to gain wider use now after nearly 14 years. This sounds a little strange as the present dynamic world has evolved so much with so many things changed. Does the old definition still fit?

A recent report revealed that more than 80% of the executives surveyed thought that the term of Big Data was overstated, confusing, or misleading. They liked the concept, but hated the phrase. As Tom Davenport pointed out, nobody likes the term and almost everybody wishes for a better, more descriptive name for it.

The big problem of Big Data is that the V-model ineffectively describes the phenomenon and is outdated for the new paradigm. Even the original author admitted that he was simply writing about the burgeoning data in the data warehousing and business intelligence world. It is necessary to redefine the term.

Big Data in today’s world is essentially the ability to parse more information, faster and deeper, to provide unprecedented insights of the business world. The concept is more about 4Rs than 4Vs in the current situation: Real-time, Relevance, Revelation and Refinery.

  • Real-time: With the maturing and commoditization of distributed file systems and parallel processing functions, real-time is realistic. Instant response is a must for most online applications. Fast analysis is compulsory for any size of data nowadays. Batch mode becomes history now, except for cost constraints and due diligence reasons. Anything less than (near) real-time brings significant competitive disadvantages.

  • Relevance: Data analysis must be context-aware, semantic, and meaningful. Simple string match or syntactic equality is no longer enough. Unrelated data is useless as a distraction. It is mandatory for data analytics to be knowledge-based with relevant information analyzed. Interdisciplinary science and engineering must be leveraged to quantify the level of relevance in the data and user’s interest areas. Simply put, what matters the most is not how much data is delivered in the fastest way, but how applicable and useful the content is to an end user’s needs at the right time and in the right place.

  • Revelation: Previously unknown things are uncovered and disclosed in some form of knowledge not before realized. Hidden patterns are identified to correlate data elements and events at massive scale. Ambiguous, vague and obscure  data sets can be crystalized to provide better views and statistics. Seemingly random data can be mined to signal the potential linkage and interlock. User behaviors are analyzed via machine learning to find and understand the collaborative influence and sentiments.

  • Refinery: Raw data are extracted and transformed into relevant and actionable information effectively on demand. The refined data is timely, clean, aggregated, insightful and well understood. Data refinery takes the uncertainty out of the data and filter/reform the data for meaningful analysis and operations. The refinement output can be multi-structured to unlock the potential value and deepen the understanding. Data may be re-refined in a self-improved process based on the downstream needs and consumption context.

It is obvious that Big Data can be better characterized by 4Rs in the new era. For more information, please contact Tony Shan ([email protected]). ©Tony Shan. All rights reserved.

Slides: Tony Shan ‘Thinking in Big Data’

Download Slide Deck: ▸ Here

An effective way of thinking in Big Data is composed of a methodical framework for dealing with the predicted shortage of 50-60% of the qualified Big Data resources in the U.S.

This holistic model comprises the scientific and engineering steps that are involved in accelerating Big Data solutions: problem, diagnosis, facts, analysis, hypothesis, solution, prototype and implementation.

In his session at Big Data Expo®, Tony Shan focused on the concept, importance, and considerations for each of these eight components.

He will drill down to the key techniques and methods that are commonly used in these steps, such as root cause examination, process mapping, force field investigation, benchmarking, interview, brainstorming, focus group, Pareto chart, SWOT, impact evaluation, gap analysis, POC, and cost-benefit study.

Best practices and lessons learned from the real-world Big Data projects will also be discussed.

Read the original blog entry...

More Stories By Tony Shan

Tony Shan works as a senior consultant, advisor at a global applications and infrastructure solutions firm helping clients realize the greatest value from their IT. Shan is a renowned thought leader and technology visionary with a number of years of field experience and guru-level expertise on cloud computing, Big Data, Hadoop, NoSQL, social, mobile, SOA, BI, technology strategy, IT roadmapping, systems design, architecture engineering, portfolio rationalization, product development, asset management, strategic planning, process standardization, and Web 2.0. He has directed the lifecycle R&D and buildout of large-scale award-winning distributed systems on diverse platforms in Fortune 100 companies and public sector like IBM, Bank of America, Wells Fargo, Cisco, Honeywell, Abbott, etc.

Shan is an inventive expert with a proven track record of influential innovations such as Cloud Engineering. He has authored dozens of top-notch technical papers on next-generation technologies and over ten books that won multiple awards. He is a frequent keynote speaker and Chair/Panel/Advisor/Judge/Organizing Committee in prominent conferences/workshops, an editor/editorial advisory board member of IT research journals/books, and a founder of several user groups, forums, and centers of excellence (CoE).

DXWorldEXPO Digital Transformation Stories
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
Bill Schmarzo, Tech Chair of "Big Data | Analytics" of upcoming CloudEXPO | DXWorldEXPO New York (November 12-13, 2018, New York City) today announced the outline and schedule of the track. "The track has been designed in experience/degree order," said Schmarzo. "So, that folks who attend the entire track can leave the conference with some of the skills necessary to get their work done when they get back to their offices. It actually ties back to some work that I'm doing at the University of ...
DXWorldEXPO LLC, the producer of the world's most influential technology conferences and trade shows has announced the 22nd International CloudEXPO | DXWorldEXPO "Early Bird Registration" is now open. Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)
When it comes to cloud computing, the ability to turn massive amounts of compute cores on and off on demand sounds attractive to IT staff, who need to manage peaks and valleys in user activity. With cloud bursting, the majority of the data can stay on premises while tapping into compute from public cloud providers, reducing risk and minimizing need to move large files. In his session at 18th Cloud Expo, Scott Jeschonek, Director of Product Management at Avere Systems, discussed the IT and busine...
The revocation of Safe Harbor has radically affected data sovereignty strategy in the cloud. In his session at 17th Cloud Expo, Jeff Miller, Product Management at Cavirin Systems, discussed how to assess these changes across your own cloud strategy, and how you can mitigate risks previously covered under the agreement.
CloudEXPO New York 2018, colocated with DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform and how we integrate our thinking to solve complicated problems. In his session at 19th Cloud Expo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and sh...