Welcome!

@DXWorldExpo Authors: Zakia Bouachraoui, Elizabeth White, Liz McMillan, Pat Romanski, Carmen Gonzalez

Related Topics: @DXWorldExpo, Java IoT, Apache

@DXWorldExpo: Article

Apache Spark: A Key to Big Data Initiatives | @CloudExpo #Microservices

As with other data processing technologies, Spark is not suitable for all types of workloads

Apache Spark continues to gain a lot of traction as companies launch or expand their big data initiatives. There is no doubt that it’s finding a place in corporate IT strategies.

The open-source cluster computing framework was developed in the AMPLab at the University of California at Berkeley in 2009 and became an incubated project of the Apache Software Foundation in 2013. By early 2014, Spark had become one of the foundation’s top-level projects, and today it is one of the most active projects managed by Apache.

Because Spark was optimized to run in-memory, it is capable of processing data much faster than other approaches such as MapReduce. As a result, Spark can provide much higher performance levels for certain types of applications. By enabling programs to load data into a cluster's memory and query it repeatedly, the framework is ideal for machine learning algorithms.

As with other data processing technologies, Spark is not suitable for all types of workloads. But companies launching big data efforts can leverage the framework for a variety of projects, such as interactive queries across large data sets; the processing of streaming data from sensors, as with Internet of Things (IoT) applications; and machine learning tasks.

In addition, developers can use Spark to support other processing tasks, taking advantage of the open source framework’s huge set of developer libraries and application programming interfaces (APIs) and comprehensive support of popular languages such as Java, Python, R and Scala.

Apache Spark has three key things going for it that IT organizations should keep in mind:

  1. The framework’s relative simplicity. The APIs are designed specifically for interacting easily and rapidly with data at scale, and are structured in such a way that enable application developers to use Spark right away.
  2. The framework is designed for speed, operating both in-memory and on disk. Spark’s performance can be even greater when supporting interactive queries of data stored in memory.
  3. Spark supports multiple programming languages as mentioned above, and it includes native support for tight integration with leading storage solutions in the Hadoop ecosystem and beyond.

Spark is proving to be well suited for a number of business use cases and is helping companies to transform their big data initiatives and deliver analytics much faster and with greater efficiency.

One company, a provider of cloud-based predictive analytics software specifically designed for the telecommunications industry, is using the full Spark stack as part of its Hadoop-based architecture on MapR. This has helped the company achieve horizontal scalability on commodity hardware and reduce storage and computing costs.

The new technology stack allows the software company to continuously innovate and deliver value to its telecommunications customers by offering predictive insights from the cloud. Today’s telecommunications data has higher volumes and frequency and more complex structures, particularly with new types of devices generating data for IoT and the use of mobile phones for a fast-growing number of apps. The company needs to use this data to generate predictive insights using data science and predictive analytics, and Spark helps make this possible.

Another business benefiting from Spark is a global pharmaceuticals manufacturer that relies on big data solutions for drug discovery processes. One of the company’s areas of drug research requires lots of interaction with diverse data from external organizations.

Combined Spark and Hadoop workflow and integration layers enable the company’s researchers to leverage thousands of experiments other organizations have conducted, providing the pharmaceuticals company with a significant competitive advantage. The big data solutions the company uses allows it to integrate and analyze data so that it can speed up drug research.

These technologies are now being used for a variety of projects across the enterprise, including video analysis, proteomics, and meta-genomics. Researchers can access data directly through a Spark API on a number of databases with schemas that are designed for their specific analytics needs.

And a third business use case for Spark comes from a service provider that delivers analytics services to various industries. The company deployed the Spark framework in conjunction with its Hadoop big data initiative, and is able to dramatically cut query times and improve the accuracy of analytics results. That has enabled the company to provide enhanced services to its customers.

Clearly, Apache Spark can provide a number of benefits to organizations looking to get the most value out of their information resources and the biggest returns on their big data investments. The framework provides the speed and efficiency improvements companies need to deliver on the promise of big data and analytics.

To further explore the advantages of Spark, see the free interactive eBook, Getting Started with Apache Spark: From Inception to Productionby James A. Scott.

More Stories By Jim Scott

Jim has held positions running Operations, Engineering, Architecture and QA teams in the Consumer Packaged Goods, Digital Advertising, Digital Mapping, Chemical and Pharmaceutical industries. Jim has built systems that handle more than 50 billion transactions per day and his work with high-throughput computing at Dow Chemical was a precursor to more standardized big data concepts like Hadoop.

DXWorldEXPO Digital Transformation Stories
Steaz, the nation's top-selling organic and fair trade green-tea-based beverage company, announces its 2017 "Mind. Body. Soul." tour, which will bring authentic experiences inspired by the brand's signature Mind. Body. Soul. tagline to life across the country. The tour will inform, educate, inspire and entertain through events, digital activations and partner-curated experiences developed to support the three pillars of complete health and wellness.
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, discussed the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, application p...
Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
DSR is a supplier of project management, consultancy services and IT solutions that increase effectiveness of a company's operations in the production sector. The company combines in-depth knowledge of international companies with expert knowledge utilising IT tools that support manufacturing and distribution processes. DSR ensures optimization and integration of internal processes which is necessary for companies to grow rapidly. The rapid growth is possible thanks, to specialized services an...