Welcome!

@DXWorldExpo Authors: William Schmarzo, Liz McMillan, Kevin Jackson, Pat Romanski, Xenia von Wedel

Related Topics: @DXWorldExpo, Java IoT, Microservices Expo, Machine Learning , Agile Computing, @CloudExpo

@DXWorldExpo: Article

Balancing the Load

Why you need to constantly monitor application performance

A question that every online application provider will face eventually is: Does my application scale? Can I add an extra 100 users and still ensure the same user experience? If the application architecture is properly designed the easiest way is to put an additional server behind the load balancer to handle more traffic.

In this article we recount an incident that happened to one of our clients when the cause of poor application performance was eventually attributed to problems with the load balancing of the application servers.

HTTP Server (500) Errors Go Over the Roof
Around 8 am the Operations team at Rendoosia Inc. (name changed for commercial reasons) got an alert from the APM tool that one of three SharePoint servers was generating many HTTP Server (500) errors. All three servers were behind a load balancer; hence why the team decided to analyze the overall performance of all three servers with the report presented in Figure 1.

Figure 1: Overview of the three SharePoint servers behind one load balancer with some KPIs: usage, response time and number of errors; two servers show performance problems

The Operations team noticed the following issues:

  1. The x.x.x.155 server (row marked with the blue box) was under significantly lower load (7k operations compared to almost 30k per each other server) than the other two. Both the load and the number of users were equally shared over two servers: x.x.x.154 and x.x.x.156
  2. Although server x.x.x.155 had the lowest user counts it was reporting the longest processing time.
  3. Server x.x.x.156 was reporting a high number of HTTP 5xx errors (marked with red box).

The team charted the HTTP server errors and the load, counted as number of transactions, for all three server over time (see Figure 2) to get a better understanding of the current situation.

Figure 2: Distribution of the number of server errors and transaction counts over time for all three servers; one server shows a lower load

The team's first observation, based on the above-mentioned reports, was that the x.x.x.155 server, with the lowest number of users, was most likely not connected to the load balancer. In order to determine the cause of the high response time on this server the team analyzed two reports:

  • Response time for x.x.x.155 broken down into network, server and redirect times indicated that almost all the time is spent on the server (see Figure 3).
  • Drill down to the operations report to analyze the load on the server (see Figure 4) shows that one particular transaction took a lot of time to complete, resulting in low application performance and poor user experience.

Figure 3: Response time breakdown for x.x.x.155: most of the time is spent on the server

Figure 4: Drill down in the context of the x.x.x.155 server shows main KPIs per transactions executed on this server; one transaction is affected by performance problems

Next, the team analyzed the 5xx errors produced by the x.x.x.156 server. They drilled down to a PurePath of one of the transactions that were reporting these errors and learned that the problem was caused by a malfunctioning database connection pool (see Figure 5)

Figure 5: Drilldown through PurePaths to the Error details reveals that the reason behind 5xx errors is caused by the database connection pool usage

The Operations team was also curious as to how the 5xx errors produced at the  x.x.x.156 server were affecting the actual user experience. The team wondered if user operations were equally distributed between both servers connected to the load balancer. The question was whether users who were unlucky and got served by the x.x.x.156 server were stuck on that server. This kind of question was hard to answer just by looking at a single SharePoint server. The Operations team used the APM tool to answer it.

Figure 6: Users remain on the server at which they have started their session

The report in Figure 6 shows that users were usually served by the same application server. Therefore those who started their session on the x.x.x.156 server remained there, resulting in a constantly poor experience due to the bad performance of that server.

Conclusion
Modern application performance management is not only about making sure that the application and database servers are operating without problems. We also need to set up the load balancer right and monitor the network infrastructure for potential problems that affect the overall application performance.

The Operations team at Rendoosia Inc., using Compuware dynaTrace Data Center Real User Monitoring (DCRUM), could get in just a few clicks from the alert about HTTP Server (500) errors through a holistic overview of application server KPIs to a root cause of the problem.

Based on the unequal load among three application servers, Requests breakdown in Figure 1 and the number of transactions in Figure 2, the team quickly determined that the x.x.x.155 server was not properly connected to the load balancer. Additional analysis illustrated that this server was also affected by low performance of one of the operations.

This story shows us that even though only one server might be experiencing performance problems, caused by many HTTP Server errors, the load balancer will not offload that server because it is not aware of those errors. That is why Operation teams need to constantly monitor, with properly set up alerts, for such outliers in application performance; even on load balanced setups.

More Stories By Sebastian Kruk

Sebastian Kruk is a Technical Product Strategist, Center of Excellence, at Compuware APM Business Unit.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@BigDataExpo Stories
Sometimes I write a blog just to formulate and organize a point of view, and I think it’s time that I pull together the bounty of excellent information about Machine Learning. This is a topic with which business leaders must become comfortable, especially tomorrow’s business leaders (tip for my next semester University of San Francisco business students!). Machine learning is a key capability that will help organizations drive optimization and monetization opportunities, and there have been some...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
Blockchain is a shared, secure record of exchange that establishes trust, accountability and transparency across business networks. Supported by the Linux Foundation's open source, open-standards based Hyperledger Project, Blockchain has the potential to improve regulatory compliance, reduce cost as well as advance trade. Are you curious about how Blockchain is built for business? In her session at 21st Cloud Expo, René Bostic, Technical VP of the IBM Cloud Unit in North America, discussed the b...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
The cloud era has reached the stage where it is no longer a question of whether a company should migrate, but when. Enterprises have embraced the outsourcing of where their various applications are stored and who manages them, saving significant investment along the way. Plus, the cloud has become a defining competitive edge. Companies that fail to successfully adapt risk failure. The media, of course, continues to extol the virtues of the cloud, including how easy it is to get there. Migrating...
Imagine if you will, a retail floor so densely packed with sensors that they can pick up the movements of insects scurrying across a store aisle. Or a component of a piece of factory equipment so well-instrumented that its digital twin provides resolution down to the micrometer.
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settle...
Product connectivity goes hand and hand these days with increased use of personal data. New IoT devices are becoming more personalized than ever before. In his session at 22nd Cloud Expo | DXWorld Expo, Nicolas Fierro, CEO of MIMIR Blockchain Solutions, will discuss how in order to protect your data and privacy, IoT applications need to embrace Blockchain technology for a new level of product security never before seen - or needed.
Leading companies, from the Global Fortune 500 to the smallest companies, are adopting hybrid cloud as the path to business advantage. Hybrid cloud depends on cloud services and on-premises infrastructure working in unison. Successful implementations require new levels of data mobility, enabled by an automated and seamless flow across on-premises and cloud resources. In his general session at 21st Cloud Expo, Greg Tevis, an IBM Storage Software Technical Strategist and Customer Solution Architec...
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, discussed the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, application p...
No hype cycles or predictions of a gazillion things here. IoT is here. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, an Associate Partner of Analytics, IoT & Cybersecurity at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He also discussed the evaluation of communication standards and IoT messaging protocols, data...
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
Sanjeev Sharma Joins June 5-7, 2018 @DevOpsSummit at @Cloud Expo New York Faculty. Sanjeev Sharma is an internationally known DevOps and Cloud Transformation thought leader, technology executive, and author. Sanjeev's industry experience includes tenures as CTO, Technical Sales leader, and Cloud Architect leader. As an IBM Distinguished Engineer, Sanjeev is recognized at the highest levels of IBM's core of technical leaders.
When it comes to cloud computing, the ability to turn massive amounts of compute cores on and off on demand sounds attractive to IT staff, who need to manage peaks and valleys in user activity. With cloud bursting, the majority of the data can stay on premises while tapping into compute from public cloud providers, reducing risk and minimizing need to move large files. In his session at 18th Cloud Expo, Scott Jeschonek, Director of Product Management at Avere Systems, discussed the IT and busine...
It’s conference season and, as you might expect, Jason and I have been on the road covering a bunch of them. It’s always great to see what the disruptive players in the market are doing — and this year did not disappoint. But there is one thing that repeatedly happens that just gets under my skin: transformation-washing. As Jason explained in a Forbes article over a year ago, ‘washing’ is when a vendor (or pundit) applies a buzzword loosely in an overt attempt to attach themselves to its buzz. ...
As many know, the first generation of Cloud Management Platform (CMP) solutions were designed for managing virtual infrastructure (IaaS) and traditional applications. But that's no longer enough to satisfy evolving and complex business requirements. In his session at 21st Cloud Expo, Scott Davis, Embotics CTO, explored how next-generation CMPs ensure organizations can manage cloud-native and microservice-based application architectures, while also facilitating agile DevOps methodology. He expla...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...