Welcome!

Big Data Journal Authors: Carmen Gonzalez, Roger Strukhoff, Jason Bloomberg, Trevor Parsons, Keith Cawley

Blog Feed Post

Deja VVVu: Others Claiming Gartner’s Construct for Big Data

By

This article originally appeared on the Gartner Blog Network in January 2012 and is reprinted here with permission from Gartner and its author Doug Laney

In the late 1990s, while a META Group analyst (Note: META is now part of Gartner), it was becoming evident that our clients increasingly were encumbered by their data assets.  While many pundits were talking about, many clients were lamenting, and many vendors were seizing the opportunity of these fast-growing data stores, I also realized that something else was going on. Sea changes in the speed at which data was flowing mainly due to electronic commerce, along with the increasing breadth of data sources, structures and formats due to the post Y2K-ERP application boom were as or more challenging to data management teams than was the increasing quantity of data.

In an attempt to help our clients get a handle on how to recognize, and more importantly, deal with these challenges I began first speaking at industry conferences on this 3-dimensional data challenge of increasing data volume, velocity and variety.  Then in late 2000 I drafted a research note published in February 2001 entitled 3-D Data Management: Controlling Data Volume, Velocity and Variety.

Fast forward to today:  The “3V’s” framework for understanding and dealing with Big Data has now become ubiquitous.  In fact, other research firms, major vendors and consulting firms have even posited the 3Vs (or an unmistakable variant) as their own concept.  Since the original piece is no longer available in Gartner archives but is in increasing demand, I wanted to make it available here for anyone to reference and cite:

Original Research Note PDF: 3-D Data Management: Controlling Data Volume, Velocity and Variety

Date: 6 February 2001     Author: Doug Laney

3-D Data Management: Controlling Data Volume, Velocity and Variety. Current business conditions and mediums are pushing traditional data management principles to their limits, giving rise to novel and more formalized approaches.

META Trend: During 2001/02, leading enterprises will increasingly use a centralized data warehouse to define a common business vocabulary that improves internal and external collaboration. Through 2003/04, data quality and integration woes will be tempered by data profiling technologies (for generating metadata, consolidated schemas, and integration logic) and information logistics agents. By 2005/06, data, document, and knowledge management will coalesce, driven by schema-agnostic indexing strategies and portal maturity.

The effect of the e-commerce surge, a rise in merger & acquisition activity, increased collaboration, and the drive for harnessing information as a competitive catalyst is driving enterprises to higher levels of consciousness about how data is managed at its most basic level.  In 2001-02, historical, integrated databases (e.g. data warehouses, operational data stores, data marts), will be leveraged not only for intended analytical purposes, but increasingly for intra-enterprise consistency and coordination. By 2003-04, these structures (including their associated metadata) will be on par with application portfolios, organization charts and procedure manuals for defining a business to its employees and affiliates.

Data records, data structures, and definitions commonly accepted throughout an enterprise reduce fiefdoms pulling against each other due to differences in the way each perceives where the enterprise has been, is presently, and is headed.  Readily accessible current and historical records of transactions, affiliates (partners, employees, customers, suppliers), business processes (or rules), along with definitional and navigational metadata (see ADS Delta 896, 21st Century Metadata: Mapping the Enterprise Genome, 7 Aug 2000) enable employees to paddle in the same direction.  Conversely, application-specific data stores (e.g. accounts receivable versus order status), geographic-specific data stores (e.g. North American sales vs. International sales), offer conflicting, or insular views of the enterprise, that while important for feeding transactional systems, provide no “single version of the truth,” giving rise to inconsistency in the way enterprise factions function.

While enterprises struggle to consolidate systems and collapse redundant databases to enable greater operational, analytical, and collaborative consistencies, changing economic conditions have made this job more difficult.  E-commerce, in particular, has exploded data management challenges along three dimensions: volumes, velocity and variety.  In 2001/02, IT organizations must compile a variety of approaches to have at their disposal for dealing with each.

Data Volume

E-commerce channels increase the depth and breadth of data available about a transaction (or any point of interaction). The lower cost of e-channels enables and enterprise to offer its goods or services to more individuals or trading partners, and up to 10x the quantity of data about an individual transaction may be collected—thereby increasing the overall volume of data to be managed.  Furthermore, as enterprises come to see information as a tangible asset, they become reluctant to discard it.

Typically, increases in data volume are handled by purchasing additional online storage.  However as data volume increases, the relative value of each data point decreases proportionately—resulting in a poor financial justification for merely incrementing online storage. Viable alternates and supplements to hanging new disk include:

  • Implementing tiered storage systems (see SIS Delta 860, 19 Apr 2000) that cost effectively balance levels of data utility with data availability using a variety of media.
  • Limiting data collected to that which will be leveraged by current or imminent business processes
  • Limiting certain analytic structures to a percentage of statistically valid sample data.
  • Profiling data sources to identify and subsequently eliminate redundancies
  • Monitoring data usage to determine “cold spots” of unused data that can be eliminated or offloaded to tape (e.g. Ambeo, BEZ Systems, Teleran)
  • Outsourcing data management altogether (e.g. EDS, IBM)

Data Velocity

E-commerce has also increased point-of-interaction (POI) speed, and consequently the pace data used to support interactions and generated by interactions. As POI performance is increasingly perceived as a competitive differentiator (e.g. Web site response, inventory availability analysis, transaction execution, order tracking update, product/service delivery, etc.) so too is an organization’s ability to manage data velocity.  Recognizing that data velocity management is much more than a physical bandwidth and protocol issue, enterprises are implementing architectural solutions such as:

  • Operational data stores (ODSs) that periodically extract, integrate and re-organize production data for operational inquiry or tactical analysis
  • Caches that provide instant access to transaction data while buffering back-end systems from additional load and performance degradation. (Unlike ODSs, caches are updated according to adaptive business rules and have schemas that mimic the back-end source.)
  • Point-to-point (P2P) data routing between databases and applications (e.g. D2K, DataMirror) that circumvents high-latency hub-and-spoke models that are more appropriate for strategic analysis
  • Designing architectures that balance data latency with application data requirements and decision cycles, without assuming the entire information supply chain must be near real-time.

Data Variety

Through 2003/04, no greater barrier to effective data management will exist than the variety of incompatible data formats, non-aligned data structures, and inconsistent data semantics.  By this time, interchange and translation mechanisms will be built into most DBMSs. But until then, application portfolio sprawl (particularly when based on a “strategy” of autonomous software implementations due to e-commerce solution immaturity), increased partnerships, and M&A activity intensifies data variety challenges. Attempts to resolve data variety issues must be approached as an ongoing endeavor encompassing the following techniques:

  • Data profiling (e.g. Data Mentors, Metagenix) to discover hidden relationships and resolve inconsistencies across multiple data sources (see ADS898)
  • XML-based data format “universal translators” that import data into standard XML documents for export into another data format (e.g. infoShark, XML Solutions)
  • Enterprise application integration (EAI) predefined adapters (e.g. NEON, Tibco, Mercator) for acquiring and delivering data between known applications via message queues, or EAI development kits for building custom adapters.
  • Data access middleware (e.g. Information Builders’ EDA/SQL, SAS Access, OLE DB, ODBC) for direct connectivity between applications and databases
  • Distributed query management (DQM) software (e.g. Enth, InfoRay, Metagon) that adds a data routing and integration intelligence layer above “dumb” data access middleware
  • Metadata management solutions (i.e. repositories and schema standards) to capture and make available definitional metadata that can help provide contextual consistency to enterprise data
  • Advanced indexing techniques for relating (if not physically integrating) data of various incompatible types (e.g. multimedia, documents, structured data, business rules).

As with any sufficiently fashionable technology, users should expect the data management market place ebb-and-flow to yield solutions that consolidate multiple techniques and solutions that are increasingly application/environment specific. (See Figure 1 – Data Management Solutions) In selecting a technique or technology, enterprises should first perform an information audit assessing the status of their information supply chain to identify and prioritize particular data management issues.

Business Impact: Attention to data management, particularly in a climate of e-commerce and greater need for collaboration, can enable enterprises to achieve greater returns on their information assets.

Bottom Line: In 2001/02, IT organizations must look beyond traditional direct brute force physical approaches to data management.  Through 2003/04, practices for resolving e-commerce accelerated data volume, velocity and variety issues will become more formalized and diverse.  Increasingly, these techniques involve trade-offs and architectural solutions that involve and impact application portfolios and business strategy decisions.

###

Over the past decade, Gartner analysts including Regina Casonato, Anne Lapkin, Mark A. Beyer, Yvonne Genovese and Ted Friedman have continued to expand our research on this topic, identifying and refining other “big data” concepts. In September 2011 they published the tremendous research note Information Management in the 21st Century.  And in 2012, Mark Beyer and I developed and published Gartner’s updated definition of Big Data to reflect its value proposition and requirements for “new innovative forms of processing.” (See The Importance of ‘Big Data’: A Definition)

Doug Laney is a research vice president for Gartner Research, where he covers business analytics solutions and projects, information management, and data-governance-related issues. He is considered a pioneer in the field of data warehousing and created the first commercial project methodology for business intelligence/data warehouse projects. Mr. Laney is also originated the discipline of information economics (infonomics). 

Follow Doug on Twitter: @Doug_Laney

Read the original blog entry...

More Stories By Bob Gourley

Bob Gourley, former CTO of the Defense Intelligence Agency (DIA), is Founder and CTO of Crucial Point LLC, a technology research and advisory firm providing fact based technology reviews in support of venture capital, private equity and emerging technology firms. He has extensive industry experience in intelligence and security and was awarded an intelligence community meritorious achievement award by AFCEA in 2008, and has also been recognized as an Infoworld Top 25 CTO and as one of the most fascinating communicators in Government IT by GovFresh.

@BigDataExpo Stories
The Internet of Things (IoT) is going to require a new way of thinking and of developing software for speed, security and innovation. This requires IT leaders to balance business as usual while anticipating for the next market and technology trends. Cloud provides the right IT asset portfolio to help today’s IT leaders manage the old and prepare for the new. Today the cloud conversation is evolving from private and public to hybrid. This session will provide use cases and insights to reinforce t...
Cultural, regulatory, environmental, political and economic (CREPE) conditions over the past decade are creating cross-industry solution spaces that require processes and technologies from both the Internet of Things (IoT), and Data Management and Analytics (DMA). These solution spaces are evolving into Sensor Analytics Ecosystems (SAE) that represent significant new opportunities for organizations of all types. Public Utilities throughout the world, providing electricity, natural gas and water,...
All major researchers estimate there will be tens of billions devices – computers, smartphones, tablets, and sensors – connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades. With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo in Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be!...
The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential. The DevOps Summit at Cloud Expo--to be held November 4-6 at the Santa Clara Convention Center in the heart of Silicon Valley--will expand the DevO...
Software AG helps organizations transform into Digital Enterprises, so they can differentiate from competitors and better engage customers, partners and employees. Using the Software AG Suite, companies can close the gap between business and IT to create digital systems of differentiation that drive front-line agility. We offer four on-ramps to the Digital Enterprise: alignment through collaborative process analysis; transformation through portfolio management; agility through process automation...
The Internet of Things (IoT) promises to create new business models as significant as those that were inspired by the Internet and the smartphone 20 and 10 years ago. What business, social and practical implications will this phenomenon bring? That's the subject of "Monetizing the Internet of Things: Perspectives from the Front Lines," an e-book released today and available free of charge from Aria Systems, the leading innovator in recurring revenue management.
The Internet of Things will put IT to its ultimate test by creating infinite new opportunities to digitize products and services, generate and analyze new data to improve customer satisfaction, and discover new ways to gain a competitive advantage across nearly every industry. In order to help corporate business units to capitalize on the rapidly evolving IoT opportunities, IT must stand up to a new set of challenges.
There’s Big Data, then there’s really Big Data from the Internet of Things. IoT is evolving to include many data possibilities like new types of event, log and network data. The volumes are enormous, generating tens of billions of logs per day, which raise data challenges. Early IoT deployments are relying heavily on both the cloud and managed service providers to navigate these challenges. In her session at 6th Big Data Expo®, Hannah Smalltree, Director at Treasure Data, to discuss how IoT, B...
Quantum is a leading expert in scale-out storage, archive and data protection, providing intelligent solutions for capturing, sharing and preserving digital assets over the entire data lifecyle. They help customers maximize the value of these assets to achieve their goals, whether it’s top movie studios looking to create the next blockbuster, researchers working to accelerate scientific discovery, or small businesses trying to streamline their operations. With a comprehensive portfolio of best-i...
The Internet of Things is tied together with a thin strand that is known as time. Coincidentally, at the core of nearly all data analytics is a timestamp. When working with time series data there are a few core principles that everyone should consider, especially across datasets where time is the common boundary. In his session at Internet of @ThingsExpo, Jim Scott, Director of Enterprise Strategy & Architecture at MapR Technologies, will discuss single-value, geo-spatial, and log time series ...
SimpleECM is the only platform to offer a powerful combination of enterprise content management (ECM) services, capture solutions, and third-party business services providing simplified integrations and workflow development for solution providers. SimpleECM is opening the market to businesses of all sizes by reinventing the delivery of ECM services. Our APIs make the development of ECM services simple with the use of familiar technologies for a frictionless integration directly into web applicat...
Software is eating the world. Companies that were not previously in the technology space now find themselves competing with Google and Amazon on speed of innovation. As the innovation cycle accelerates, companies must embrace rapid and constant change to both applications and their infrastructure, and find a way to deliver speed and agility of development without sacrificing reliability or efficiency of operations. In her keynote DevOps Summit, Victoria Livschitz, CEO of Qubell, will discuss ho...
Dyn solutions are at the core of Internet Performance. Through traffic management, message management and performance assurance, Dyn is connecting people through the Internet and ensuring information gets where it needs to go, faster and more reliably than ever before. Founded in 2001 at WPI, Dyn’s global presence services more than four million enterprise, small business and personal customers.
All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades. Over the summer Gartner released its much anticipated annual Hype Cycle report and the big news is that Internet of Things has now replaced Big Data as the most hyped technology. Indeed, we're hearing more and more about this fascinating new technological paradigm. ...
You use an agile process; your goal is to make your organization more agile. But what about your data infrastructure? The truth is, today’s databases are anything but agile – they are effectively static repositories that are cumbersome to work with, difficult to change, and cannot keep pace with application demands. Performance suffers as a result, and it takes far longer than it should to deliver new features and capabilities needed to make your organization competitive. As your application an...
SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer's customers range from Web startups to global enterprises. Products and services include bare metal and virtual servers, networking, turnkey big data solutions, private cloud solutions, and more. SoftLayer's unique advantages include the industry's first Network-Within-a-Network topology for true out-of-band access, and an easy-to-...
Despite the fact that majority of developers firmly believe that “it worked on my laptop” is a poor excuse for production failures, most don’t truly understand why it is virtually impossible to make your development environment representative of production. When asked, the primary reason for the production/development difference everyone mentions is technology stack spec/configuration differences. While it’s true, thanks to the black magic of Cloud (capitalization intended) with a bit of wizard...
SYS-CON Events announced today that AppDynamics will exhibit at DevOps Summit Silicon Valley, which will take place November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Digital businesses like yours need a way to turn data into actual results. AppDynamics is ushering in the next digital age – the age of the software-defined business. AppDynamics’ mission is to deliver true application intelligence that helps your software-defined business run faster, leaner, and more ef...
Performance is the intersection of power, agility, control, and choice. If you value performance, and more specifically consistent performance, you need to look beyond simple virtualized compute. Many factors need to be considered to create a truly performant environment. In their General Session at 15th Cloud Expo, Phil Jackson, Development Community Advocate at SoftLayer, and Harold Hannon, Sr. Software Architect at SoftLayer, to discuss how to take advantage of a multitude of compute option...
Predicted by Gartner to add $1.9 trillion to the global economy by 2020, the Internet of Everything (IoE) is based on the idea that devices, systems and services will connect in simple, transparent ways, enabling seamless interactions among devices across brands and sectors. As this vision unfolds, it is clear that no single company can accomplish the level of interoperability required to support the horizontal aspects of the IoE. The AllSeen Alliance, announced in December 2013, was formed wi...