Welcome!

@DXWorldExpo Authors: Zakia Bouachraoui, Elizabeth White, Liz McMillan, Jason Bloomberg, Pat Romanski

Related Topics: @DXWorldExpo, Java IoT, Microservices Expo, Machine Learning , Agile Computing, @CloudExpo

@DXWorldExpo: Article

Balancing the Load

Why you need to constantly monitor application performance

A question that every online application provider will face eventually is: Does my application scale? Can I add an extra 100 users and still ensure the same user experience? If the application architecture is properly designed the easiest way is to put an additional server behind the load balancer to handle more traffic.

In this article we recount an incident that happened to one of our clients when the cause of poor application performance was eventually attributed to problems with the load balancing of the application servers.

HTTP Server (500) Errors Go Over the Roof
Around 8 am the Operations team at Rendoosia Inc. (name changed for commercial reasons) got an alert from the APM tool that one of three SharePoint servers was generating many HTTP Server (500) errors. All three servers were behind a load balancer; hence why the team decided to analyze the overall performance of all three servers with the report presented in Figure 1.

Figure 1: Overview of the three SharePoint servers behind one load balancer with some KPIs: usage, response time and number of errors; two servers show performance problems

The Operations team noticed the following issues:

  1. The x.x.x.155 server (row marked with the blue box) was under significantly lower load (7k operations compared to almost 30k per each other server) than the other two. Both the load and the number of users were equally shared over two servers: x.x.x.154 and x.x.x.156
  2. Although server x.x.x.155 had the lowest user counts it was reporting the longest processing time.
  3. Server x.x.x.156 was reporting a high number of HTTP 5xx errors (marked with red box).

The team charted the HTTP server errors and the load, counted as number of transactions, for all three server over time (see Figure 2) to get a better understanding of the current situation.

Figure 2: Distribution of the number of server errors and transaction counts over time for all three servers; one server shows a lower load

The team's first observation, based on the above-mentioned reports, was that the x.x.x.155 server, with the lowest number of users, was most likely not connected to the load balancer. In order to determine the cause of the high response time on this server the team analyzed two reports:

  • Response time for x.x.x.155 broken down into network, server and redirect times indicated that almost all the time is spent on the server (see Figure 3).
  • Drill down to the operations report to analyze the load on the server (see Figure 4) shows that one particular transaction took a lot of time to complete, resulting in low application performance and poor user experience.

Figure 3: Response time breakdown for x.x.x.155: most of the time is spent on the server

Figure 4: Drill down in the context of the x.x.x.155 server shows main KPIs per transactions executed on this server; one transaction is affected by performance problems

Next, the team analyzed the 5xx errors produced by the x.x.x.156 server. They drilled down to a PurePath of one of the transactions that were reporting these errors and learned that the problem was caused by a malfunctioning database connection pool (see Figure 5)

Figure 5: Drilldown through PurePaths to the Error details reveals that the reason behind 5xx errors is caused by the database connection pool usage

The Operations team was also curious as to how the 5xx errors produced at the  x.x.x.156 server were affecting the actual user experience. The team wondered if user operations were equally distributed between both servers connected to the load balancer. The question was whether users who were unlucky and got served by the x.x.x.156 server were stuck on that server. This kind of question was hard to answer just by looking at a single SharePoint server. The Operations team used the APM tool to answer it.

Figure 6: Users remain on the server at which they have started their session

The report in Figure 6 shows that users were usually served by the same application server. Therefore those who started their session on the x.x.x.156 server remained there, resulting in a constantly poor experience due to the bad performance of that server.

Conclusion
Modern application performance management is not only about making sure that the application and database servers are operating without problems. We also need to set up the load balancer right and monitor the network infrastructure for potential problems that affect the overall application performance.

The Operations team at Rendoosia Inc., using Compuware dynaTrace Data Center Real User Monitoring (DCRUM), could get in just a few clicks from the alert about HTTP Server (500) errors through a holistic overview of application server KPIs to a root cause of the problem.

Based on the unequal load among three application servers, Requests breakdown in Figure 1 and the number of transactions in Figure 2, the team quickly determined that the x.x.x.155 server was not properly connected to the load balancer. Additional analysis illustrated that this server was also affected by low performance of one of the operations.

This story shows us that even though only one server might be experiencing performance problems, caused by many HTTP Server errors, the load balancer will not offload that server because it is not aware of those errors. That is why Operation teams need to constantly monitor, with properly set up alerts, for such outliers in application performance; even on load balanced setups.

More Stories By Sebastian Kruk

Sebastian Kruk is a Technical Product Strategist, Center of Excellence, at Compuware APM Business Unit.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


DXWorldEXPO Digital Transformation Stories
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
CloudEXPO New York 2018, colocated with DevOpsSUMMIT and DXWorldEXPO New York 2018 will be held November 12-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI and Machine Learning to one location.
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
ICC is a computer systems integrator and server manufacturing company focused on developing products and product appliances to meet a wide range of computational needs for many industries. Their solutions provide benefits across many environments, such as datacenter deployment, HPC, workstations, storage networks and standalone server installations. ICC has been in business for over 23 years and their phenomenal range of clients include multinational corporations, universities, and small busines...
Headquartered in Plainsboro, NJ, Synametrics Technologies has provided IT professionals and computer systems developers since 1997. Based on the success of their initial product offerings (WinSQL and DeltaCopy), the company continues to create and hone innovative products that help its customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business or per...
All in Mobile is a place where we continually maximize their impact by fostering understanding, empathy, insights, creativity and joy. They believe that a truly useful and desirable mobile app doesn't need the brightest idea or the most advanced technology. A great product begins with understanding people. It's easy to think that customers will love your app, but can you justify it? They make sure your final app is something that users truly want and need. The only way to do this is by ...
DXWorldEXPO LLC announced today that Nutanix has been named "Platinum Sponsor" of CloudEXPO | DevOpsSUMMIT | DXWorldEXPO New York, which will take place November 12-13, 2018 in New York City. Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that power their business. The Nutanix Enterprise Cloud Platform blends web-scale engineering and consumer-grade design to natively converge server, storage, virtualization and networking into a resilient, softwar...
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...