Welcome!

@DXWorldExpo Authors: Yeshim Deniz, Pat Romanski, Liz McMillan, Jason Bloomberg, Zakia Bouachraoui

Related Topics: @DXWorldExpo, Java IoT, @CloudExpo

@DXWorldExpo: Blog Post

Just When You Thought It Was Safe to Go Back into the Data By @ABridgwater | @BigDataExpo #BigData

The data lake is a place where Big Data exists in its first-formed state

So-called ‘paradigm shifts' happen across the information technology landscape roughly every five years. We can make this statement with enough approximate ambiguity for it still to be of some value in terms of the way an average CIO might look to plan for major infrastructural changes.

First Big Data, then the flood
As we stand today, Big Data has been around for roughly five years.

In the year 2000, famed economist Francis X. Diebold is said to have published the first version of a paper titled "Big Data Dynamic Factor Models for Macroeconomic Measurement and Forecasting" - and the rest we know is modern history.

Time enough then for a new data-centric paradigm shift.

More recently we have been offered the chance to expand our overall notion of Big Data and think about the idea of a ‘data lake.'

A term coined by a technology evangelist and CTO at data analytics company Pentaho, the data lake is a truly massive (but easily accessible) data repository built on (relatively) inexpensive computer hardware for storing Big Data. But the point is that it's BIG Big Data, i.e., it's all data, of all types, with all attributes in all shapes and sizes.

A CIO swimming in data
To make the idea clearer, the CIO can find him or herself swimming in a vast amount of data lake water and eventually reach shore (as it were) once certain decisions are made relating to how specific elements of the lake are to be used. At this point we have reached the datamart.

The datamart is also a home for Big Data. But it is one where we have made certain decisions, assumptions and judgements about the data in the data lake such that we have been able to classify it with a certain kind of ‘labelling nomenclature' if you will. That is to say, in the datamart we have optimized our data by choosing to store only those attributes (fields, parameters, etc.) that we think we need.

The data lake is a place where all data and all attributes still exist - this is a place of raw data where Big Data exists in its first-formed state.

As Dixon wrote when he first defined the term, "If you think of a datamart as a store of bottled water - cleansed and packaged and structured for easy consumption - the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake and various users of the lake can come to examine, dive in, or take samples."

The CIO's responsibility: data preparation
The CIO's task when facing the data lake is one of planning. We can now refer to data preparation as a defined and specific task in the Big Data toolbox. The CIO also needs to think about fitting the breadth and width of the data lake into the firm's existing (or future-planned) IT infrastructure. The CIO also needs to plan how to juggle these repositories of raw data from many sources and in many formats. Finally the CIO needs to ensure the controls exist to navigate the data lake, i.e., its data will need to be prepped by IT staff and data scientists for specific queries in a lengthy, complex process.

Automation tools for data lake navigation are the inevitable upshot of the way this zone of IT is developing. But it is still early days on this part of the innovation curve and CIOs may well spend more time treading water (and keeping their head above water) than anything else.

The best advice is to prepare to navigate the data lake now; the next Big Data paradigm shift is only half a decade away.

This post is sponsored by KPMG LLP and The CIO Agenda.

KPMG LLP is a Delaware limited liability partnership and is the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative ("KPMG International"), a Swiss entity. The KPMG name, logo and "cutting through complexity" are registered trademarks or trademarks of KPMG International. The views and opinions expressed herein are those of the authors and do not necessarily represent the views and opinions of KPMG LLP.

More Stories By Adrian Bridgwater

Adrian Bridgwater is a freelance journalist and corporate content creation specialist focusing on cross platform software application development as well as all related aspects software engineering, project management and technology as a whole.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


DXWorldEXPO Digital Transformation Stories
DXWorldEXPO LLC announced today that Nutanix has been named "Platinum Sponsor" of CloudEXPO | DevOpsSUMMIT | DXWorldEXPO New York, which will take place November 12-13, 2018 in New York City. Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that power their business. The Nutanix Enterprise Cloud Platform blends web-scale engineering and consumer-grade design to natively converge server, storage, virtualization and networking into a resilient, softwar...
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
CloudEXPO New York 2018, colocated with DevOpsSUMMIT and DXWorldEXPO New York 2018 will be held November 12-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI and Machine Learning to one location.
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
ICC is a computer systems integrator and server manufacturing company focused on developing products and product appliances to meet a wide range of computational needs for many industries. Their solutions provide benefits across many environments, such as datacenter deployment, HPC, workstations, storage networks and standalone server installations. ICC has been in business for over 23 years and their phenomenal range of clients include multinational corporations, universities, and small busines...
Headquartered in Plainsboro, NJ, Synametrics Technologies has provided IT professionals and computer systems developers since 1997. Based on the success of their initial product offerings (WinSQL and DeltaCopy), the company continues to create and hone innovative products that help its customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business or per...
All in Mobile is a place where we continually maximize their impact by fostering understanding, empathy, insights, creativity and joy. They believe that a truly useful and desirable mobile app doesn't need the brightest idea or the most advanced technology. A great product begins with understanding people. It's easy to think that customers will love your app, but can you justify it? They make sure your final app is something that users truly want and need. The only way to do this is by ...
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...