Welcome!

@BigDataExpo Authors: Elizabeth White, Liz McMillan, ManageEngine IT Matters, Pat Romanski, Rajeev Kozhikkattuthodi

Related Topics: @BigDataExpo, Java IoT, Open Source Cloud, Containers Expo Blog, Agile Computing, @CloudExpo

@BigDataExpo: Article

Big Data: So What! That’s Why You Virtualize

How data virtualization enables Big Data volume, variety, velocity and value

Big Data!  Yes it's BIG!

The volume is BIG!  The variety is BIG!  The velocity is BIG!

And hopefully the business value is BIG!

New Opportunities Bring New Ways to Leverage Proven Technology
There is no shortage of media articles, analyst reports, tradeshows, blogs and other source of Big Data technology insight and advice.

But it strikes me that in our search to be on the leading edge, we may be overlooking some great existing technology.

In fact, some technology, for example data virtualization, is even more useful in a Big Data world.

What Is Data Virtualization?
Data virtualization is an agile data integration approach organizations use to gain more insight from their data.  This includes traditional sources such as transaction systems, data warehouses and more as well as new sources such the cloud and Big Data.

Unlike data consolidation or data replication, data virtualization integrates these diverse data types without costly extra copies and additional data management complexity.  Seriously, if the data is already big, why make it even bigger by copying and storing it again and again?

With data virtualization, you respond faster to ever changing analytics and BI needs, fast-track your data management evolution and save 50-75% over data replication and consolidation.   In other words, you deliver value, the most important V but often not listed with the 3 Vs of Big Data (Volume, Velocity & Variety).

Variety Is Big Data Integration Challenge #1
Often, the biggest Big Data integration challenge is variety, not volume. Consider all the different Big Data types that may require integration:

  • Massively Parallel Processing based Appliances - Examples include EMC Greenplum, HP Vertica, IBM Netezza, SAP Hana, and more
  • Columnar/tabular NoSQL Data Stores - Examples include Hadoop, Hypertable, and more
  • XML Document Data Stores - Examples include CouchDB, MarkLogic, and MongoDB, and more
  • Key/value Data Stores - Examples include Cassandra, Memcached, Voldemort, and more

Fortunately integrating heterogeneous data sources is the original raison d'etre of data virtualization.  Why do you think many still call it data federation?

Volume Is Big Data Integration Challenge #2
As listed above there are many ways to store and manage big data.  Similarly, a plethora of analysis tools exist such as MapR, Karmasphere, Alpine Data Labs and more.

The biggest volume challenge is how to query large data sets from these high-volume sources at speed in order to feed these analytics?

The answer is data virtualization.

Data virtualization platforms use sophisticated rule- and cost-based query-optimization strategies that automatically create a query plan that optimizes processing and performance, with minimum overhead.

Advanced Query Optimization Is the Key to Data Virtualization
Here are but a few of the query optimization strategies and techniques data virtualization provides:

  • Pushdown - Data virtualization offloads as much query processing as possible by pushing down select query operations such as string searches, comparisons, local joins, sorting, aggregating, grouping into the underlying data sources. Thus you can take advantage of native capabilities.
  • Parallel Processing - Data virtualization optimizes query execution by employing parallel and asynchronous request processing. After building an optimized query plan, the data virtualization server executes data service calls asynchronously on separate threads, reducing idle time and data source response latency.
  • Distributed Joins - Data virtualization detects when a query being executed involves data consumed from different data sources and tries to employ distributed query optimization techniques to improve overall performance and minimize the amount of data moved over the network.  A variety of sort-merge, semi, hash and nested-loop joins are leveraged depending on the nature of the query and data sources.
  • Caching - Data virtualization can be configured to cache results for query, procedure and web service calls.  When enabled, the caching engine stores the cached result sets and queries them as appropriate.
  • Advanced Query Optimization - Data virtualization provides a number of additional techniques and algorithms include data source grouping, join algorithm selection, join ordering, union-join inversion, predicate pooling and propagation, and projection pruning.
  • Integrated Network and Database Optimization - Even in a Big Data world; network bandwidth is generally the scarcest resource in the query processing pipeline. So reducing the amount of data that needs to be transferred has a significant impact on the latency and overall performance.  Data virtualization optimizes the network and the query processing capabilities of underlying big data sources intelligently, in combination.

Value and Velocity are Big Data Integration Challenges #3 and #4
Big Data itself only has value when the data is analyzed.  This analysis provides value by uncovering drivers for growth, finding better ways to attract and retain customers, and identifying opportunities for innovation and costs reduction.

As such the fastest path to Big Data analysis is also the fastest path to business value.

But everyone knows that providing analytics with the data required has always been difficult, with data integration long considered the biggest bottleneck in any analytics project.

The Data Warehousing Institute confirms this lack of agility.  Their recent study stated the average time needed to add a new data source to an existing BI application was 8.4 weeks in 2009, 7.4 weeks in 2010, and 7.8 weeks in 2011. And 33% of the organizations needed more than 3 months to add a new data source.

Data Virtualization Provides Velocity along with Analytic Value
According to Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility, data virtualization significantly accelerates data integration agility. Key to this success is data virtualization's

  • Streamlined data integration approach
  • Iterative development process
  • Adaptable change management process

Using data virtualization as a complement to existing data integration approaches, the ten organizations profiled in the book cut analytics project times in half or more.

This agility allowed the same teams to double their number of analytics projects, significantly accelerating the business value delivered.  In other words, value with velocity!

Variety, Volume, Velocity and Value
Big Data is all the rage.  And at first glance, the Big Data variety, volume, velocity and value challenges may seem extraordinarily difficult.

Proven technologies, such as data virtualization, provide proven approaches to addressing these "big" challenges.

So if Big Data is on your agenda, don't forget to make a big commitment to data virtualization.  You'll be glad you did.

More Stories By Robert Eve

Robert Eve is the EVP of Marketing at Composite Software, the data virtualization gold standard and co-author of Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility. Bob's experience includes executive level roles at leading enterprise software companies such as Mercury Interactive, PeopleSoft, and Oracle. Bob holds a Masters of Science from the Massachusetts Institute of Technology and a Bachelor of Science from the University of California at Berkeley.

@BigDataExpo Stories
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
"We focus on composable infrastructure. Composable infrastructure has been named by companies like Gartner as the evolution of the IT infrastructure where everything is now driven by software," explained Bruno Andrade, CEO and Founder of HTBase, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Artificial intelligence, machine learning, neural networks. We’re in the midst of a wave of excitement around AI such as hasn’t been seen for a few decades. But those previous periods of inflated expectations led to troughs of disappointment. Will this time be different? Most likely. Applications of AI such as predictive analytics are already decreasing costs and improving reliability of industrial machinery. Furthermore, the funding and research going into AI now comes from a wide range of com...
In this presentation, Striim CTO and founder Steve Wilkes will discuss practical strategies for counteracting fraud and cyberattacks by leveraging real-time streaming analytics. In his session at @ThingsExpo, Steve Wilkes, Founder and Chief Technology Officer at Striim, will provide a detailed look into leveraging streaming data management to correlate events in real time, and identify potential breaches across IoT and non-IoT systems throughout the enterprise. Strategies for processing massive ...
SYS-CON Events announced today that GrapeUp, the leading provider of rapid product development at the speed of business, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market acr...
SYS-CON Events announced today that Ayehu will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara California. Ayehu provides IT Process Automation & Orchestration solutions for IT and Security professionals to identify and resolve critical incidents and enable rapid containment, eradication, and recovery from cyber security breaches. Ayehu provides customers greater control over IT infras...
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
SYS-CON Events announced today that MobiDev, a client-oriented software development company, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex software systems for startups and enterprises. Since 2009 it has grown from a small group of passionate engineers and business...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Me...
Join us at Cloud Expo June 6-8 to find out how to securely connect your cloud app to any cloud or on-premises data source – without complex firewall changes. More users are demanding access to on-premises data from their cloud applications. It’s no longer a “nice-to-have” but an important differentiator that drives competitive advantages. It’s the new “must have” in the hybrid era. Users want capabilities that give them a unified view of the data to get closer to customers and grow business. The...
SYS-CON Events announced today that Cloud Academy named "Bronze Sponsor" of 21st International Cloud Expo which will take place October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara, CA. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud com...
Automation is enabling enterprises to design, deploy, and manage more complex, hybrid cloud environments. Yet the people who manage these environments must be trained in and understanding these environments better than ever before. A new era of analytics and cognitive computing is adding intelligence, but also more complexity, to these cloud environments. How smart is your cloud? How smart should it be? In this power panel at 20th Cloud Expo, moderated by Conference Chair Roger Strukhoff, paneli...
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...
SYS-CON Events announced today that CA Technologies has been named "Platinum Sponsor" of SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business - from apparel to energy - is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the applic...
We build IoT infrastructure products - when you have to integrate different devices, different systems and cloud you have to build an application to do that but we eliminate the need to build an application. Our products can integrate any device, any system, any cloud regardless of protocol," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
Amazon started as an online bookseller 20 years ago. Since then, it has evolved into a technology juggernaut that has disrupted multiple markets and industries and touches many aspects of our lives. It is a relentless technology and business model innovator driving disruption throughout numerous ecosystems. Amazon’s AWS revenues alone are approaching $16B a year making it one of the largest IT companies in the world. With dominant offerings in Cloud, IoT, eCommerce, Big Data, AI, Digital Assista...
SYS-CON Events announced today that Enzu will exhibit at SYS-CON's 21st Int\ernational Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive advantage. By offering a suite of proven hosting and management services, Enzu wants companies to focus on the core of their ...