Welcome!

@DXWorldExpo Authors: Pat Romanski, Liz McMillan, Elizabeth White, William Schmarzo, Kevin Benedict

Related Topics: @DXWorldExpo, Java IoT, Apache

@DXWorldExpo: Article

Apache Spark: A Key to Big Data Initiatives | @CloudExpo #Microservices

As with other data processing technologies, Spark is not suitable for all types of workloads

Apache Spark continues to gain a lot of traction as companies launch or expand their big data initiatives. There is no doubt that it’s finding a place in corporate IT strategies.

The open-source cluster computing framework was developed in the AMPLab at the University of California at Berkeley in 2009 and became an incubated project of the Apache Software Foundation in 2013. By early 2014, Spark had become one of the foundation’s top-level projects, and today it is one of the most active projects managed by Apache.

Because Spark was optimized to run in-memory, it is capable of processing data much faster than other approaches such as MapReduce. As a result, Spark can provide much higher performance levels for certain types of applications. By enabling programs to load data into a cluster's memory and query it repeatedly, the framework is ideal for machine learning algorithms.

As with other data processing technologies, Spark is not suitable for all types of workloads. But companies launching big data efforts can leverage the framework for a variety of projects, such as interactive queries across large data sets; the processing of streaming data from sensors, as with Internet of Things (IoT) applications; and machine learning tasks.

In addition, developers can use Spark to support other processing tasks, taking advantage of the open source framework’s huge set of developer libraries and application programming interfaces (APIs) and comprehensive support of popular languages such as Java, Python, R and Scala.

Apache Spark has three key things going for it that IT organizations should keep in mind:

  1. The framework’s relative simplicity. The APIs are designed specifically for interacting easily and rapidly with data at scale, and are structured in such a way that enable application developers to use Spark right away.
  2. The framework is designed for speed, operating both in-memory and on disk. Spark’s performance can be even greater when supporting interactive queries of data stored in memory.
  3. Spark supports multiple programming languages as mentioned above, and it includes native support for tight integration with leading storage solutions in the Hadoop ecosystem and beyond.

Spark is proving to be well suited for a number of business use cases and is helping companies to transform their big data initiatives and deliver analytics much faster and with greater efficiency.

One company, a provider of cloud-based predictive analytics software specifically designed for the telecommunications industry, is using the full Spark stack as part of its Hadoop-based architecture on MapR. This has helped the company achieve horizontal scalability on commodity hardware and reduce storage and computing costs.

The new technology stack allows the software company to continuously innovate and deliver value to its telecommunications customers by offering predictive insights from the cloud. Today’s telecommunications data has higher volumes and frequency and more complex structures, particularly with new types of devices generating data for IoT and the use of mobile phones for a fast-growing number of apps. The company needs to use this data to generate predictive insights using data science and predictive analytics, and Spark helps make this possible.

Another business benefiting from Spark is a global pharmaceuticals manufacturer that relies on big data solutions for drug discovery processes. One of the company’s areas of drug research requires lots of interaction with diverse data from external organizations.

Combined Spark and Hadoop workflow and integration layers enable the company’s researchers to leverage thousands of experiments other organizations have conducted, providing the pharmaceuticals company with a significant competitive advantage. The big data solutions the company uses allows it to integrate and analyze data so that it can speed up drug research.

These technologies are now being used for a variety of projects across the enterprise, including video analysis, proteomics, and meta-genomics. Researchers can access data directly through a Spark API on a number of databases with schemas that are designed for their specific analytics needs.

And a third business use case for Spark comes from a service provider that delivers analytics services to various industries. The company deployed the Spark framework in conjunction with its Hadoop big data initiative, and is able to dramatically cut query times and improve the accuracy of analytics results. That has enabled the company to provide enhanced services to its customers.

Clearly, Apache Spark can provide a number of benefits to organizations looking to get the most value out of their information resources and the biggest returns on their big data investments. The framework provides the speed and efficiency improvements companies need to deliver on the promise of big data and analytics.

To further explore the advantages of Spark, see the free interactive eBook, Getting Started with Apache Spark: From Inception to Productionby James A. Scott.

More Stories By Jim Scott

Jim has held positions running Operations, Engineering, Architecture and QA teams in the Consumer Packaged Goods, Digital Advertising, Digital Mapping, Chemical and Pharmaceutical industries. Jim has built systems that handle more than 50 billion transactions per day and his work with high-throughput computing at Dow Chemical was a precursor to more standardized big data concepts like Hadoop.

@BigDataExpo Stories
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices t...
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
DevOps promotes continuous improvement through a culture of collaboration. But in real terms, how do you: Integrate activities across diverse teams and services? Make objective decisions with system-wide visibility? Use feedback loops to enable learning and improvement? With technology insights and real-world examples, in his general session at @DevOpsSummit, at 21st Cloud Expo, Andi Mann, Chief Technology Advocate at Splunk, explored how leading organizations use data-driven DevOps to clos...
Recently, REAN Cloud built a digital concierge for a North Carolina hospital that had observed that most patient call button questions were repetitive. In addition, the paper-based process used to measure patient health metrics was laborious, not in real-time and sometimes error-prone. In their session at 21st Cloud Expo, Sean Finnerty, Executive Director, Practice Lead, Health Care & Life Science at REAN Cloud, and Dr. S.P.T. Krishnan, Principal Architect at REAN Cloud, discussed how they built...
No hype cycles or predictions of a gazillion things here. IoT is here. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, an Associate Partner of Analytics, IoT & Cybersecurity at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He also discussed the evaluation of communication standards and IoT messaging protocols, data...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve f...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, discussed the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, application p...
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
The “Digital Era” is forcing us to engage with new methods to build, operate and maintain applications. This transformation also implies an evolution to more and more intelligent applications to better engage with the customers, while creating significant market differentiators. In both cases, the cloud has become a key enabler to embrace this digital revolution. So, moving to the cloud is no longer the question; the new questions are HOW and WHEN. To make this equation even more complex, most ...
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone in...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
Blockchain is a shared, secure record of exchange that establishes trust, accountability and transparency across business networks. Supported by the Linux Foundation's open source, open-standards based Hyperledger Project, Blockchain has the potential to improve regulatory compliance, reduce cost as well as advance trade. Are you curious about how Blockchain is built for business? In her session at 21st Cloud Expo, René Bostic, Technical VP of the IBM Cloud Unit in North America, discussed the b...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
DevOps at Cloud Expo – being held June 5-7, 2018, at the Javits Center in New York, NY – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Among the proven benefits,...
@DevOpsSummit at Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, is co-located with 22nd Cloud Expo | 1st DXWorld Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait...