Welcome!

@DXWorldExpo Authors: Zakia Bouachraoui, Pat Romanski, Elizabeth White, Yeshim Deniz, Liz McMillan

Related Topics: @DXWorldExpo, Microservices Expo, @CloudExpo, @ThingsExpo

@DXWorldExpo: Blog Feed Post

Lambda Architecture For Big Data By @JnanDash | @ThingsExpo [#BigData #IoT]

Lambda Architecture is a useful framework to think about designing big data applications

I attended a Meetup yesterday in Mountain View, hosted by The Hive group on the subject of Lambda Architecture. Since I had never heard about this new phrase, my curiosity took me there. There was a panel discussion and panelists came from Hortonworks, Cloudera, MapR, Teradata, etc.

Lambda Architecture is a useful framework to think about designing big data applications. Nathan Marz designed this generic architecture addressing common requirements for big data based on his experience working on distributed data processing systems at Twitter. Some of the key requirements in building this architecture include:

  • Fault-tolerance against hardware failures and human errors
  • Support for a variety of use cases that include low latency querying as well as updates
  • Linear scale-out capabilities, meaning that throwing more machines at the problem should help with getting the job done
  • Extensibility so that the system is manageable and can accommodate newer features easily

The following pictures summarizes the framework.

Overview of the Lambda Architecture

The Lambda Architecture as seen in the picture has three major components.

  1. Batch layer that provides the following functionality
    1. managing the master dataset, an immutable, append-only set of raw data
    2. pre-computing arbitrary query functions, called batch views.
  2. Serving layer—This layer indexes the batch views so that they can be queried in ad hoc with low latency.
  3. Speed layer—This layer accommodates all requests that are subject to low latency requirements. Using fast and incremental algorithms, the speed layer deals with recent data only.

Criticism of lambda architecture has focused on its inherent complexity and its limiting influence. The batch and streaming sides each require a different code base that must be maintained and kept in sync so that processed data produces the same result from both paths. Yet attempting to abstract the code bases into a single framework puts many of the specialized tools in the batch and real-time ecosystems out of reach.

The panelists rambled on details without addressing real challenges on combining two very different approaches, thus compromising the benefits of stream with added latency of the batch world. However, there is merit to the thought process of unification of the two disparate worlds into a common framework. Real deployment will be the proof point. 

Read the original blog entry...

More Stories By Jnan Dash

Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at http://jnandash.ulitzer.com.

DXWorldEXPO Digital Transformation Stories
Using new techniques of information modeling, indexing, and processing, new cloud-based systems can support cloud-based workloads previously not possible for high-throughput insurance, banking, and case-based applications. In his session at 18th Cloud Expo, John Newton, CTO, Founder and Chairman of Alfresco, described how to scale cloud-based content management repositories to store, manage, and retrieve billions of documents and related information with fast and linear scalability. He addresse...
A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organizers to pass great deals to great conferences, helping you discover new conferences and increase your return on investment.
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
We are seeing a major migration of enterprises applications to the cloud. As cloud and business use of real time applications accelerate, legacy networks are no longer able to architecturally support cloud adoption and deliver the performance and security required by highly distributed enterprises. These outdated solutions have become more costly and complicated to implement, install, manage, and maintain.SD-WAN offers unlimited capabilities for accessing the benefits of the cloud and Internet. ...
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
SYS-CON Events announced today that Silicon India has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Published in Silicon Valley, Silicon India magazine is the premiere platform for CIOs to discuss their innovative enterprise solutions and allows IT vendors to learn about new solutions that can help grow their business.
DXWorldEXPO LLC announced today that "IoT Now" was named media sponsor of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. IoT Now explores the evolving opportunities and challenges facing CSPs, and it passes on some lessons learned from those who have taken the first steps in next-gen IoT services.
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
Everyone wants the rainbow - reduced IT costs, scalability, continuity, flexibility, manageability, and innovation. But in order to get to that collaboration rainbow, you need the cloud! In this presentation, we'll cover three areas: First - the rainbow of benefits from cloud collaboration. There are many different reasons why more and more companies and institutions are moving to the cloud. Benefits include: cost savings (reducing on-prem infrastructure, reducing data center foot print, redu...
SYS-CON Events announced today that DatacenterDynamics has been named “Media Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY. DatacenterDynamics is a brand of DCD Group, a global B2B media and publishing company that develops products to help senior professionals in the world's most ICT dependent organizations make risk-based infrastructure and capacity decisions.