Welcome!

@DXWorldExpo Authors: Liz McMillan, John Walsh, Elizabeth White, Pat Romanski, John Katrick

Related Topics: @DXWorldExpo, Java IoT, Microservices Expo, Machine Learning , Agile Computing, @CloudExpo

@DXWorldExpo: Article

Balancing the Load

Why you need to constantly monitor application performance

A question that every online application provider will face eventually is: Does my application scale? Can I add an extra 100 users and still ensure the same user experience? If the application architecture is properly designed the easiest way is to put an additional server behind the load balancer to handle more traffic.

In this article we recount an incident that happened to one of our clients when the cause of poor application performance was eventually attributed to problems with the load balancing of the application servers.

HTTP Server (500) Errors Go Over the Roof
Around 8 am the Operations team at Rendoosia Inc. (name changed for commercial reasons) got an alert from the APM tool that one of three SharePoint servers was generating many HTTP Server (500) errors. All three servers were behind a load balancer; hence why the team decided to analyze the overall performance of all three servers with the report presented in Figure 1.

Figure 1: Overview of the three SharePoint servers behind one load balancer with some KPIs: usage, response time and number of errors; two servers show performance problems

The Operations team noticed the following issues:

  1. The x.x.x.155 server (row marked with the blue box) was under significantly lower load (7k operations compared to almost 30k per each other server) than the other two. Both the load and the number of users were equally shared over two servers: x.x.x.154 and x.x.x.156
  2. Although server x.x.x.155 had the lowest user counts it was reporting the longest processing time.
  3. Server x.x.x.156 was reporting a high number of HTTP 5xx errors (marked with red box).

The team charted the HTTP server errors and the load, counted as number of transactions, for all three server over time (see Figure 2) to get a better understanding of the current situation.

Figure 2: Distribution of the number of server errors and transaction counts over time for all three servers; one server shows a lower load

The team's first observation, based on the above-mentioned reports, was that the x.x.x.155 server, with the lowest number of users, was most likely not connected to the load balancer. In order to determine the cause of the high response time on this server the team analyzed two reports:

  • Response time for x.x.x.155 broken down into network, server and redirect times indicated that almost all the time is spent on the server (see Figure 3).
  • Drill down to the operations report to analyze the load on the server (see Figure 4) shows that one particular transaction took a lot of time to complete, resulting in low application performance and poor user experience.

Figure 3: Response time breakdown for x.x.x.155: most of the time is spent on the server

Figure 4: Drill down in the context of the x.x.x.155 server shows main KPIs per transactions executed on this server; one transaction is affected by performance problems

Next, the team analyzed the 5xx errors produced by the x.x.x.156 server. They drilled down to a PurePath of one of the transactions that were reporting these errors and learned that the problem was caused by a malfunctioning database connection pool (see Figure 5)

Figure 5: Drilldown through PurePaths to the Error details reveals that the reason behind 5xx errors is caused by the database connection pool usage

The Operations team was also curious as to how the 5xx errors produced at the  x.x.x.156 server were affecting the actual user experience. The team wondered if user operations were equally distributed between both servers connected to the load balancer. The question was whether users who were unlucky and got served by the x.x.x.156 server were stuck on that server. This kind of question was hard to answer just by looking at a single SharePoint server. The Operations team used the APM tool to answer it.

Figure 6: Users remain on the server at which they have started their session

The report in Figure 6 shows that users were usually served by the same application server. Therefore those who started their session on the x.x.x.156 server remained there, resulting in a constantly poor experience due to the bad performance of that server.

Conclusion
Modern application performance management is not only about making sure that the application and database servers are operating without problems. We also need to set up the load balancer right and monitor the network infrastructure for potential problems that affect the overall application performance.

The Operations team at Rendoosia Inc., using Compuware dynaTrace Data Center Real User Monitoring (DCRUM), could get in just a few clicks from the alert about HTTP Server (500) errors through a holistic overview of application server KPIs to a root cause of the problem.

Based on the unequal load among three application servers, Requests breakdown in Figure 1 and the number of transactions in Figure 2, the team quickly determined that the x.x.x.155 server was not properly connected to the load balancer. Additional analysis illustrated that this server was also affected by low performance of one of the operations.

This story shows us that even though only one server might be experiencing performance problems, caused by many HTTP Server errors, the load balancer will not offload that server because it is not aware of those errors. That is why Operation teams need to constantly monitor, with properly set up alerts, for such outliers in application performance; even on load balanced setups.

More Stories By Sebastian Kruk

Sebastian Kruk is a Technical Product Strategist, Center of Excellence, at Compuware APM Business Unit.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@BigDataExpo Stories
"ZeroStack is a startup in Silicon Valley. We're solving a very interesting problem around bringing public cloud convenience with private cloud control for enterprises and mid-size companies," explained Kamesh Pemmaraju, VP of Product Management at ZeroStack, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
"Codigm is based on the cloud and we are here to explore marketing opportunities in America. Our mission is to make an ecosystem of the SW environment that anyone can understand, learn, teach, and develop the SW on the cloud," explained Sung Tae Ryu, CEO of Codigm, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, discussed how by using ne...
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
"Infoblox does DNS, DHCP and IP address management for not only enterprise networks but cloud networks as well. Customers are looking for a single platform that can extend not only in their private enterprise environment but private cloud, public cloud, tracking all the IP space and everything that is going on in that environment," explained Steve Salo, Principal Systems Engineer at Infoblox, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventio...
"Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
DevOps promotes continuous improvement through a culture of collaboration. But in real terms, how do you: Integrate activities across diverse teams and services? Make objective decisions with system-wide visibility? Use feedback loops to enable learning and improvement? With technology insights and real-world examples, in his general session at @DevOpsSummit, at 21st Cloud Expo, Andi Mann, Chief Technology Advocate at Splunk, explored how leading organizations use data-driven DevOps to close th...
SYS-CON Events announced today that Evatronix will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Evatronix SA offers comprehensive solutions in the design and implementation of electronic systems, in CAD / CAM deployment, and also is a designer and manufacturer of advanced 3D scanners for professional applications.
"We are an integrator of carrier ethernet and bandwidth to get people to connect to the cloud, to the SaaS providers, and the IaaS providers all on ethernet," explained Paul Mako, CEO & CTO of Massive Networks, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Sanjeev Sharma Joins June 5-7, 2018 @DevOpsSummit at @Cloud Expo New York Faculty. Sanjeev Sharma is an internationally known DevOps and Cloud Transformation thought leader, technology executive, and author. Sanjeev's industry experience includes tenures as CTO, Technical Sales leader, and Cloud Architect leader. As an IBM Distinguished Engineer, Sanjeev is recognized at the highest levels of IBM's core of technical leaders.