Welcome!

@DXWorldExpo Authors: Liz McMillan, Elizabeth White, Automic Blog, John Katrick, William Schmarzo

Related Topics: @DXWorldExpo, Microservices Expo

@DXWorldExpo: Blog Feed Post

Big Data Redefined By @TonyShan | @CloudExpo [#BigData]

Big Data is a loose term for the collection, storage, processing, and sophisticated analysis of massive amounts of data

Big Data is a loose term for the collection, storage, processing, and sophisticated analysis of massive amounts of data, far larger and from many more kinds of sources than ever before. The definition of Big Data can be traced back to the 3Vs model defined by Doug Laney in 2001: Volume, Velocity, and Variety. The fourth V was later added in different fashions, such as “Value” or “Veracity”.

Interestingly the conceptualization of Big Data in the beginning of this century seems to gain wider use now after nearly 14 years. This sounds a little strange as the present dynamic world has evolved so much with so many things changed. Does the old definition still fit?

A recent report revealed that more than 80% of the executives surveyed thought that the term of Big Data was overstated, confusing, or misleading. They liked the concept, but hated the phrase. As Tom Davenport pointed out, nobody likes the term and almost everybody wishes for a better, more descriptive name for it.

The big problem of Big Data is that the V-model ineffectively describes the phenomenon and is outdated for the new paradigm. Even the original author admitted that he was simply writing about the burgeoning data in the data warehousing and business intelligence world. It is necessary to redefine the term.

Big Data in today’s world is essentially the ability to parse more information, faster and deeper, to provide unprecedented insights of the business world. The concept is more about 4Rs than 4Vs in the current situation: Real-time, Relevance, Revelation and Refinery.

  • Real-time: With the maturing and commoditization of distributed file systems and parallel processing functions, real-time is realistic. Instant response is a must for most online applications. Fast analysis is compulsory for any size of data nowadays. Batch mode becomes history now, except for cost constraints and due diligence reasons. Anything less than (near) real-time brings significant competitive disadvantages.

  • Relevance: Data analysis must be context-aware, semantic, and meaningful. Simple string match or syntactic equality is no longer enough. Unrelated data is useless as a distraction. It is mandatory for data analytics to be knowledge-based with relevant information analyzed. Interdisciplinary science and engineering must be leveraged to quantify the level of relevance in the data and user’s interest areas. Simply put, what matters the most is not how much data is delivered in the fastest way, but how applicable and useful the content is to an end user’s needs at the right time and in the right place.

  • Revelation: Previously unknown things are uncovered and disclosed in some form of knowledge not before realized. Hidden patterns are identified to correlate data elements and events at massive scale. Ambiguous, vague and obscure  data sets can be crystalized to provide better views and statistics. Seemingly random data can be mined to signal the potential linkage and interlock. User behaviors are analyzed via machine learning to find and understand the collaborative influence and sentiments.

  • Refinery: Raw data are extracted and transformed into relevant and actionable information effectively on demand. The refined data is timely, clean, aggregated, insightful and well understood. Data refinery takes the uncertainty out of the data and filter/reform the data for meaningful analysis and operations. The refinement output can be multi-structured to unlock the potential value and deepen the understanding. Data may be re-refined in a self-improved process based on the downstream needs and consumption context.

It is obvious that Big Data can be better characterized by 4Rs in the new era. For more information, please contact Tony Shan ([email protected]). ©Tony Shan. All rights reserved.

Slides: Tony Shan ‘Thinking in Big Data’

Download Slide Deck: ▸ Here

An effective way of thinking in Big Data is composed of a methodical framework for dealing with the predicted shortage of 50-60% of the qualified Big Data resources in the U.S.

This holistic model comprises the scientific and engineering steps that are involved in accelerating Big Data solutions: problem, diagnosis, facts, analysis, hypothesis, solution, prototype and implementation.

In his session at Big Data Expo®, Tony Shan focused on the concept, importance, and considerations for each of these eight components.

He will drill down to the key techniques and methods that are commonly used in these steps, such as root cause examination, process mapping, force field investigation, benchmarking, interview, brainstorming, focus group, Pareto chart, SWOT, impact evaluation, gap analysis, POC, and cost-benefit study.

Best practices and lessons learned from the real-world Big Data projects will also be discussed.

Read the original blog entry...

More Stories By Tony Shan

Tony Shan works as a senior consultant, advisor at a global applications and infrastructure solutions firm helping clients realize the greatest value from their IT. Shan is a renowned thought leader and technology visionary with a number of years of field experience and guru-level expertise on cloud computing, Big Data, Hadoop, NoSQL, social, mobile, SOA, BI, technology strategy, IT roadmapping, systems design, architecture engineering, portfolio rationalization, product development, asset management, strategic planning, process standardization, and Web 2.0. He has directed the lifecycle R&D and buildout of large-scale award-winning distributed systems on diverse platforms in Fortune 100 companies and public sector like IBM, Bank of America, Wells Fargo, Cisco, Honeywell, Abbott, etc.

Shan is an inventive expert with a proven track record of influential innovations such as Cloud Engineering. He has authored dozens of top-notch technical papers on next-generation technologies and over ten books that won multiple awards. He is a frequent keynote speaker and Chair/Panel/Advisor/Judge/Organizing Committee in prominent conferences/workshops, an editor/editorial advisory board member of IT research journals/books, and a founder of several user groups, forums, and centers of excellence (CoE).

@BigDataExpo Stories
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
"Codigm is based on the cloud and we are here to explore marketing opportunities in America. Our mission is to make an ecosystem of the SW environment that anyone can understand, learn, teach, and develop the SW on the cloud," explained Sung Tae Ryu, CEO of Codigm, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, discussed how by using ne...
SYS-CON Events announced today that Evatronix will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Evatronix SA offers comprehensive solutions in the design and implementation of electronic systems, in CAD / CAM deployment, and also is a designer and manufacturer of advanced 3D scanners for professional applications.
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
"We are an integrator of carrier ethernet and bandwidth to get people to connect to the cloud, to the SaaS providers, and the IaaS providers all on ethernet," explained Paul Mako, CEO & CTO of Massive Networks, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Sanjeev Sharma Joins June 5-7, 2018 @DevOpsSummit at @Cloud Expo New York Faculty. Sanjeev Sharma is an internationally known DevOps and Cloud Transformation thought leader, technology executive, and author. Sanjeev's industry experience includes tenures as CTO, Technical Sales leader, and Cloud Architect leader. As an IBM Distinguished Engineer, Sanjeev is recognized at the highest levels of IBM's core of technical leaders.
Leading companies, from the Global Fortune 500 to the smallest companies, are adopting hybrid cloud as the path to business advantage. Hybrid cloud depends on cloud services and on-premises infrastructure working in unison. Successful implementations require new levels of data mobility, enabled by an automated and seamless flow across on-premises and cloud resources. In his general session at 21st Cloud Expo, Greg Tevis, an IBM Storage Software Technical Strategist and Customer Solution Architec...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
An increasing number of companies are creating products that combine data with analytical capabilities. Running interactive queries on Big Data requires complex architectures to store and query data effectively, typically involving data streams, an choosing efficient file format/database and multiple independent systems that are tied together through custom-engineered pipelines. In his session at @BigDataExpo at @ThingsExpo, Tomer Levi, a senior software engineer at Intel’s Advanced Analytics gr...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things’). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing? IoT is not about the devices, it’s about the data consumed and generated. The devices are tools, mechanisms, conduits. In his session at Internet of Things at Cloud Expo | DXWor...
Everything run by electricity will eventually be connected to the Internet. Get ahead of the Internet of Things revolution. In his session at @ThingsExpo, Akvelon expert and IoT industry leader Sergey Grebnov provided an educational dive into the world of managing your home, workplace and all the devices they contain with the power of machine-based AI and intelligent Bot services for a completely streamlined experience.
DevOps promotes continuous improvement through a culture of collaboration. But in real terms, how do you: Integrate activities across diverse teams and services? Make objective decisions with system-wide visibility? Use feedback loops to enable learning and improvement? With technology insights and real-world examples, in his general session at @DevOpsSummit, at 21st Cloud Expo, Andi Mann, Chief Technology Advocate at Splunk, explored how leading organizations use data-driven DevOps to close th...