Welcome!

@DXWorldExpo Authors: Nate Vickery, Elizabeth White, Jason Bloomberg, Liz McMillan, William Schmarzo

Related Topics: @DXWorldExpo, Microservices Expo, Microsoft Cloud, Containers Expo Blog, @CloudExpo, Apache, SDN Journal

@DXWorldExpo: Article

Big Data and Master Data Management

Single Source of Truth in Big Data

Master Data Management (MDM) is a very important data governance aspect in enterprises whereby MDM enables the development of a "Single Version of Truth." MDM establishes Single Version of Truth by providing common descriptions for enterprise-wide entities.

Need for MDM in Big Data Processing
Before Big Data, enterprises generally managed their transaction data in traditional relational databases. One of the biggest strengths of relational databases is their ability to enforce constraints like check constraints, primary key, foreign key, etc., which ensure that the data captured is of the highest quality.

In spite of such support for data integrity, enterprises had duplicates in their master data that resulted in inaccurate results in analytics on that data. For example, an enterprise may target an expensive advertisement campaign for a new product to its existing customers; however, due to the fact that a particular customer may exist with different IDs across multiple systems, the enterprise may be sending its campaign materials to the same person multiple times.

Similarly, a manufacturing enterprise may be analyzing the problem and complaint records from their customers, but a lack of uniformity between the product codes across the regions and a lack of uniformity across problem types may result in inaccurate quantification of the issues.

Enterprises traditionally attack the Master Data Management by implementing following measures.

  • Enables development of a "single version of the truth" by establishing common descriptions for core business entities across multiple systems.
  • Assess current master data maturity across the enterprise, identify target maturity and identify gaps
  • Master Data Management Tool Selection
  • Master data models and cleansed data
  • MDM governance and stewardship
  • MDM Strategy to tackle mergers and acquisitions

With the advent of Big Data processing, enterprises started analyzing massive amounts of unstructured data from unconventional sources, which means the inconsistencies across the data is increasing and the level of validations that are performed at the data capture is very limited when compared to the traditional relational data capture.

For example, if the enterprises wanted to target customers on social media with the potential for one customer represented in multiple social media forums in different names, the chances of the campaigns either overreaching a person or not reaching at all is very high. The same is true when microblogging sites are used to analyze the voice of the customer and categorize complaints across products. There is high possibility that customer misspell the product names or use some local naming conventions for the same products that will prevent an effective analysis.

Master Data Managemet in Big Data
The following are some of the approaches of integrating MDM data quality solutions in Big Data Processing so that the true insights on the massive quantities can be generated and these insights can really be accurate for the enterprises.

  • Adopting Hybrid Big Data Solutions: As highlighted In my last article on Hybrid Big Data Solutions, integrating Big Data with the existing relational data which is likely to contain MDM source data bases is one of the easiest ways to ensure data quality on the big data sources.
  • Matching More Than Keywords: The massive quantities of unstructured data bring together a greater level of ambiguity about classification and relevance of the documents, and hence a mere key word matching of entities to get the subject of interest is not enough. Most of the current examples on Big Data is more about utilizing standard regular expression functions, however the true potential of Big Data in conjunction with MDM can be achieved if Text Analytics is adopted on Big Data more than standard regular expressions.
  • Adopting a Data Virtualization Layer: Data Virtualization platform provides a common hub for capturing data across traditional and big data and hence the business rules can be managed at this layer which will ensure the data quality across disparate data sources.
  • Utilize the Power of Hadoop Database Extensions: Big Data frameworks like Hadoop provide the ability to keep the data in their own file system HDFS without transforming them, and the data can be accessed using SQL Like languages. For example Hive allows to read the data in Hadoop file system using SQL Interface. Similarly HBase is a columnar database implemented on top of HDFS file system. These implementations have support for imposing constraints on the underlying Big Data. For example Hive supports JOIN across tables which will go a long way in checking for integrity with respect to MDM.

Summary
While enterprises continue to adopt Big Data as part of their data management the biggest challenge will be the data quality. The RDBMS have done a great job on the data integrity and Big Data should support the same. Implementing the traditional Master Data Management on top of Big Data / Unified Data will go a long way in providing the correct insights from the Big Data processing.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

@BigDataExpo Stories
Recently, REAN Cloud built a digital concierge for a North Carolina hospital that had observed that most patient call button questions were repetitive. In addition, the paper-based process used to measure patient health metrics was laborious, not in real-time and sometimes error-prone. In their session at 21st Cloud Expo, Sean Finnerty, Executive Director, Practice Lead, Health Care & Life Science at REAN Cloud, and Dr. S.P.T. Krishnan, Principal Architect at REAN Cloud, discussed how they built...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
The “Digital Era” is forcing us to engage with new methods to build, operate and maintain applications. This transformation also implies an evolution to more and more intelligent applications to better engage with the customers, while creating significant market differentiators. In both cases, the cloud has become a key enabler to embrace this digital revolution. So, moving to the cloud is no longer the question; the new questions are HOW and WHEN. To make this equation even more complex, most ...
Blockchain is a shared, secure record of exchange that establishes trust, accountability and transparency across business networks. Supported by the Linux Foundation's open source, open-standards based Hyperledger Project, Blockchain has the potential to improve regulatory compliance, reduce cost as well as advance trade. Are you curious about how Blockchain is built for business? In her session at 21st Cloud Expo, René Bostic, Technical VP of the IBM Cloud Unit in North America, discussed the b...
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone in...
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
No hype cycles or predictions of a gazillion things here. IoT is here. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, an Associate Partner of Analytics, IoT & Cybersecurity at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He also discussed the evaluation of communication standards and IoT messaging protocols, data...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, discussed the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, application p...
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve f...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
DevOps at Cloud Expo – being held June 5-7, 2018, at the Javits Center in New York, NY – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Among the proven benefits,...
@DevOpsSummit at Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, is co-located with 22nd Cloud Expo | 1st DXWorld Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait...
Cloud Expo | DXWorld Expo have announced the conference tracks for Cloud Expo 2018. Cloud Expo will be held June 5-7, 2018, at the Javits Center in New York City, and November 6-8, 2018, at the Santa Clara Convention Center, Santa Clara, CA. Digital Transformation (DX) is a major focus with the introduction of DX Expo within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive ov...
SYS-CON Events announced today that T-Mobile exhibited at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on qua...
SYS-CON Events announced today that Cedexis will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Cedexis is the leader in data-driven enterprise global traffic management. Whether optimizing traffic through datacenters, clouds, CDNs, or any combination, Cedexis solutions drive quality and cost-effectiveness. For more information, please visit https://www.cedexis.com.