Welcome!

@DXWorldExpo Authors: Pat Romanski, Liz McMillan, Roger Strukhoff, Elizabeth White, Mark Herring

Related Topics: @DXWorldExpo, Containers Expo Blog, @CloudExpo

@DXWorldExpo: Article

Testing #BigData Applications | @CloudExpo @YourZephyr #AI #Analytics

Big Data applications are used for everything from targeting retail customers to bringing new pharmaceuticals to market quicker

Big Data applications are used for everything from targeting retail customers to bringing new pharmaceuticals to market more quickly. As their name suggests, they are built to acquire, sort and put to use vast sets of information, whether the data points are historical sales figures for a product, results of clinical trial simulations or details in an online dating profile.

There are plenty of freely available tools out there for building Big Data applications, most notably Apache Hadoop, which supports distributed storage and processing using commodity hardware. Partnerships between companies like Intel - the chipmaker has worked on Big Data frameworks such as its Trusted Analytics Platform, available on Rackspace and Amazon Web Services - and Cloudera, a company that specializes in Hadoop, have also made it easier for organizations to build Hadoop applications in recent years.

"Hadoop helps here with its HDFS (Hadoop Distributed File System), which lets you store a large amount of data on a cloud of machines," explained Vasu Swaminathan, director of quality engineering for Aspire Systems, in a blog post. "On top of HDFS, Hadoop provides an API to process the stored data which is Map Reduce. The idea is since the data is stored in a distributed manner across nodes, it can be processed well in that manner where each node can process the data stored on it instead of getting hit by performance degradation issues by moving it over the network."

So, "how do I build a Big Data application? is becoming an easier question to answer. But what about "how do I test a Big Data application?" That one is harder to deal with. Since these programs go beyond the capabilities and infrastructures that underpin typical pieces of PC or mobile software, they require a unique, scalable approach to testing.

Requirements and Tips for Testing a Big Data Application
Many so-called "Big Data applications" can be categorized as one of the following:

  • A data warehouse (DWH), which can be a federated physical or logical repository that collects data from a variety of sources (just like a literal warehouse may contain items from all over). DWH programs only work with structured data, and use batch processing to handle it. They are typically built on relational databases such as Oracle.
  • A Big Data storage system, which has fewer limitations and can accept more types of data. Such an application could process streaming data, unstructured data, etc. It might have a capacity well into the petabytes, whereas a DWH would only reach into the neighborhood of gigabytes or terabytes. It is built on a filesystem like HDFS.

Both types can be set up to work with items such as web search queries, medical records, video archives and documents of different kinds. The goal of testing these applications is largely to validate all of this information. In a widely read Quora answer on this topic, Parul Gangwar explained that there are four Vs of the data in question to look out for here: variety, volume, velocity and value.

Gauging these metrics and verifying the data as a whole is usually done through performance and functional testing. A few things to note along the way:

  • An early step is to make sure that the right data has been pulled into the system in the first place. Source data may be compared to what's actually in the Hadoop cluster.
  • Business logic may be evaluated on each node to verify that Map Reduce is working for the application.
  • Finally, the output of the application may be checked for corruption as well as application of the correct transformational rules.
  • Hadoop architectures may also be subjected to failover testing to make sure that they can hold up even under the heavy workloads that they are regularly saddled with.
  • The system should also be checked for basic capabilities such as its throughput and how well its subcomponents perform at tasks such as indexing messages.

Properly testing a Big Data application is helped by having knowledge of a variety of tools, frameworks and programming languages, everything from Java to Jenkins. An enterprise test management system is also valuable for its real-time capabilities when vetting Big Data applications.

Given the vast scale of Big Data programs, automation is particularly important during the building and testing phases. It helps improve and streamline software testing processes so that DevOps teams can move more quickly to validate the data throughout the application.

"[T]est automation can be a good approach in testing Big Data implementations," Swaminathan explained at the end of his post. "Identifying the requirements and building a robust automation framework can help in doing comprehensive testing. However, a lot would depend on how the skills of the tester and how the Big Data environment is set up."

More Stories By Sanjay Zalavadia

As the VP of Client Service for Zephyr, Sanjay Zalavadia brings over 15 years of leadership experience in IT and Technical Support Services. Throughout his career, Sanjay has successfully established and grown premier IT and Support Services teams across multiple geographies for both large and small companies.

Most recently, he was Associate Vice President at Patni Computers (NYSE: PTI) responsible for the Telecoms IT Managed Services Practice where he established IT Operations teams supporting Virgin Mobile, ESPN Mobile, Disney Mobile and Carphone Warehouse. Prior to this Sanjay was responsible for Global Technical Support at Bay Networks, a leading routing and switching vendor, which was acquired by Nortel. He has also held management positions in Support Service organizations at start-up Silicon Valley Networks, a vendor of Test Management software, and SynOptics.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@BigDataExpo Stories
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
Cloud Expo | DXWorld Expo have announced the conference tracks for Cloud Expo 2018. Cloud Expo will be held June 5-7, 2018, at the Javits Center in New York City, and November 6-8, 2018, at the Santa Clara Convention Center, Santa Clara, CA. Digital Transformation (DX) is a major focus with the introduction of DX Expo within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive ov...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, discussed the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, application p...
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve f...
Continuous Delivery makes it possible to exploit findings of cognitive psychology and neuroscience to increase the productivity and happiness of our teams. In his session at 22nd Cloud Expo | DXWorld Expo, Daniel Jones, CTO of EngineerBetter, will answer: How can we improve willpower and decrease technical debt? Is the present bias real? How can we turn it to our advantage? Can you increase a team’s effective IQ? How do DevOps & Product Teams increase empathy, and what impact does empath...
DevOps promotes continuous improvement through a culture of collaboration. But in real terms, how do you: Integrate activities across diverse teams and services? Make objective decisions with system-wide visibility? Use feedback loops to enable learning and improvement? With technology insights and real-world examples, in his general session at @DevOpsSummit, at 21st Cloud Expo, Andi Mann, Chief Technology Advocate at Splunk, explored how leading organizations use data-driven DevOps to close th...
As many know, the first generation of Cloud Management Platform (CMP) solutions were designed for managing virtual infrastructure (IaaS) and traditional applications. But that's no longer enough to satisfy evolving and complex business requirements. In his session at 21st Cloud Expo, Scott Davis, Embotics CTO, explored how next-generation CMPs ensure organizations can manage cloud-native and microservice-based application architectures, while also facilitating agile DevOps methodology. He expla...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
"Digital transformation - what we knew about it in the past has been redefined. Automation is going to play such a huge role in that because the culture, the technology, and the business operations are being shifted now," stated Brian Boeggeman, VP of Alliances & Partnerships at Ayehu, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone inn...
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
No hype cycles or predictions of a gazillion things here. IoT is here. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, an Associate Partner of Analytics, IoT & Cybersecurity at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He also discussed the evaluation of communication standards and IoT messaging protocols, data...
Recently, REAN Cloud built a digital concierge for a North Carolina hospital that had observed that most patient call button questions were repetitive. In addition, the paper-based process used to measure patient health metrics was laborious, not in real-time and sometimes error-prone. In their session at 21st Cloud Expo, Sean Finnerty, Executive Director, Practice Lead, Health Care & Life Science at REAN Cloud, and Dr. S.P.T. Krishnan, Principal Architect at REAN Cloud, discussed how they built...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...
SYS-CON Events announced today that Evatronix will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Evatronix SA offers comprehensive solutions in the design and implementation of electronic systems, in CAD / CAM deployment, and also is a designer and manufacturer of advanced 3D scanners for professional applications.
The “Digital Era” is forcing us to engage with new methods to build, operate and maintain applications. This transformation also implies an evolution to more and more intelligent applications to better engage with the customers, while creating significant market differentiators. In both cases, the cloud has become a key enabler to embrace this digital revolution. So, moving to the cloud is no longer the question; the new questions are HOW and WHEN. To make this equation even more complex, most ...