Welcome!

Big Data Journal Authors: Elizabeth White, Scott Bampton, Carmen Gonzalez, Liz McMillan, Roger Strukhoff

Related Topics: Cloud Expo, Java, Linux, Virtualization, Security, SDN Journal

Cloud Expo: Article

Why Intelligent VM Routing Is Critical to Your Private Cloud’s Success

Hosting decisions are far too important to be left to simplistic, best-efforts approaches

Virtualized and private cloud infrastructures are all about sharing resources - compute, storage and network. Optimizing these environments comes down to the ability to properly balance capacity supply and application demand. In practical terms, this means allocating the right amount of resources and putting workloads in the right places. These decisions are critical to ensuring performance, compliance and cost control.

Yet most organizations are using antiquated methods such as home-grown spreadsheets and best guesses to determine which infrastructure to host workloads on and how much capacity to allocate. Not only do these approaches hinder operational agility, but as hosting decisions become more and more complex, they are downright dangerous. The typical strategy employed to stave off risk is to over-provision infrastructure, and the thinking behind this is that having an excess of capacity on hand will ensure that enough resource is available to avoid any performance problems. This is not only expensive, but it actually doesn't prevent key operational issues and many of the performance and compliance issues that are caused by incorrectly combining workloads.

In essence, this management challenge is the same one faced by hotel operators. Hoteliers need to constantly align guest demands with hotel resources and amenities. A hotel could not operate without a reservation system to manage resource availability and match that with guest needs, and yet this is exactly how companies manage their virtual and internal cloud environments. Imagine if a hotel didn't have the operational control provided by their reservation system, and was constantly forced to build more rooms than necessary in order to meet "potential" guest demands, rather than basing their decision on an actual profile of historical and predicted demand. Or if they put clients in rooms without enough beds or required amenities. This should start sounding familiar to anyone who has managed a production virtual environment.

Hotels have had the luxury of a long history to refine their operations, and by using reservations systems to properly place guests and manage current and future bookings, they have gained a complete picture of available resources at any point in time. In doing so, they have optimized their ability to plan for and leverage available capacity, achieving the right balance between supply and demand.

Why Workload Routing and Reservations are Important
By applying the same principles used to manage a hotel's available capacity to their own operations, IT organizations can significantly reduce risk and cost while ensuring service levels in virtual and cloud infrastructures. There are five reasons why the process of workload routing and capacity reservation must become a core, automated component of IT planning and management:

1. Complexity of the Hosting Decision
Hosting decisions are all about optimally aligning supply with demand. However, this is very complex in modern infrastructures, where capabilities can vary widely, and the requirements of the workloads may have a significant impact on what can go where. To make the optimal decision, there are three important questions that must be asked:

  • Do the infrastructure capabilities satisfy the workload requirements? This is commonly referred to as "fit for purpose," and is required to determine whether the hosting environment is suitable for the kind of workload being hosted. This question has not always been top of mind in the past, as the typical process to deploy new applications has been to procure new infrastructure with very detailed specifications. But the increasing use of shared environments is changing this, and understanding the specifications of the currently running hosting environments is critical. Unfortunately, early virtual environments tended to be one-size-fits-all, and early internal clouds tended to focus on dev/test workloads, so fit for purpose decisions rarely extended beyond ensuring the environment has the right CPU architecture.
  • Will the workloads fit? While the fit for purpose analysis is concerned with whether a target environment has the right kind of capacity, this aspect of making hosting decisions is concerned with whether there is sufficient free capacity to host the workloads. This is a more traditional capacity problem, but with a twist, as virtual and cloud environments are by nature shared environments, and the capacity equation is multi-dimensional. Resources such as CPU, memory, disk, I/O, network I/O, storage capacity, etc., must be considered, as well as looking at the levels and patterns of activity to ensure that the new workloads are "dovetailing" with the existing ones. Furthermore, any analysis of capacity must also ensure that the workload will fit at the point in time it will be deployed and it must continue to fit beyond that time.
  • What is the relative cost? While fit and suitability are critical to where to host a workload, in a tiebreaker the main issue becomes relative cost. While many organizations are still not sophisticated enough to have an accurate chargeback model in place, a more precise way to determine cost may be to consider the relative cost of hosting a workload as a function of policy and placement.

2. Capacity Supply and Application Demand are Dynamic
Nothing stands still in virtualized IT environments, and any decisions must be made in the context of ever-changing technologies, hardware specs, service catalogs, application requirements and workloads. This is becoming even more prevalent in the age of the software-defined data center.

Because of this, capacity must be viewed as a pipeline, with inbound demands, inbound supply side capacity, outbound demands and decommissioned capacity all being part of the natural flow of activity. Handling this flow is a key to achieving agility, which is a goal in the current breed of virtual and cloud hosting infrastructure. The ability to efficiently react to changing needs is critical, and the lack of agility in legacy environments is really a reflection of the fact that previous approaches did not operate as a pipeline. If it currently takes two to three months to get capacity, then it is a clear indication that there is no pipeline in place.

3. Meeting Your Customers Expectations
Application owners today have expectations that capacity will be available when required, so it's necessary for IT to have a way to hold capacity for planned workload placements to be available on the date of deployment (like advance booking a hotel room).

Sometimes the concept of a capacity reservation is equated with the draw-down on a pool of resources or a quota that has been assigned to a consumer or internal group. This is dangerous, as it simply ensures that a specific amount of resources will not be exceeded, and does not guarantee that actual resources will be available. This is analogous to getting a coupon from a store that says "limit 10 per customer" - it in no way guarantees that there will be any product left on the shelf. Organizations should beware of these types of reservations, as they can give a false sense of security.

Capacity reservations are extremely useful to those managing the infrastructure capacity. They provide an accurate model of the pipeline of demand, which allows for much more efficient, accurate and timely purchasing decisions. Simply put, less idle capacity needs to be left on the floor. It also allows infrastructure to be managed as a portfolio, and if a certain mix of resources is needed to satisfy the overall supply and demand balance (such as buying servers with more memory), then procurement can factor this in.

4. Even Self-Service Needs Reservations
Self-service can create a highly volatile demand pipeline. But a bigger issue with self-service models is the way organizations perceive them. Many early cloud implementations focus on dev/test users or more grid-type workloads, and the entire approach to delivering capacity takes on a last-minute, unplanned flavor. But these are not the only kinds of workloads - or even the most common - and for a cloud to become a true "next-generation" hosting platform it must also support enterprise applications and proper release planning processes.

The heart of the issue is a tendency for organizations to equate self-service with instant provisioning. Although instant provisioning is useful for dev/test, grid and other horizontal scaling scenarios, it is not the only approach. For example, an online hotel reservation site provides self-service access to hotel rooms, but these rooms are not often being booked for that night. For business trips, conferences and even vacations, you book ahead. The same process must be put into place for hosting workloads.

Rather than narrowly defining self-service as the immediate provisioning of capacity, it is better to focus on the intelligent provisioning of capacity, which may or may not be immediate. For enterprise workloads with proper planning cycles and typical lead times, reservations are far more important than instant provisioning. And deciding where the application should be hosted in the first place is a solution critical decision that is often overlooked. Unless an organization has only one hosting environment, the importance (and difficulty) of this should not be underestimated.

5. Demand Is Global
There is a huge benefit to thinking big when it comes to making hosting decisions. The long-term trend will undoubtedly be to start thinking beyond the four walls of an organization and make broader hosting decisions that include external cloud providers, outsourcing models and other potential avenues of efficiency. But the use of external capacity is still a distant roadmap item in many IT organizations, and the current focus tends to be on making the best use of existing capacity and purchasing dollars.

Operating in scale also allows certain assumptions to be challenged, such as the requirement for an application to be hosted at a specific geographical location. Geographical constraints should be fully understood and properly identified, and not simply assumed based on past activity or server-hugging paranoia. Some workloads do have specific jurisdictional constraints, compliance requirements or latency sensitivities, but many have a significant amount of leeway in this regard, and to constrain them unnecessarily ties up expensive data center resources.

Unfortunately, the manual processes and spreadsheet-based approaches in use in many organizations are simply not capable of operating at the necessary scale, and cannot properly model the true requirements and constraints of a workload. This not only means that decisions are made in an overly narrow context, but that the decisions that are made are likely wrong.

Moving Past Your "Gut"
Hosting decisions are far too important to be left to simplistic, best-efforts approaches. Where a workload is placed and how resources are assigned to it is likely the most important factor in operational efficiency and safety, and is even more critical as organizations consider cloud hosting models. These decisions must be driven by the true requirements of the applications, the capabilities of the infrastructure, the policies in force and the pipeline of activity. They should be made in the context of the global picture, where all supply and demand can be considered and all hosting assumptions challenged. And they should be made in software, not brains, so they are repeatable, accurate and can drive automation.

More Stories By Andrew Hillier

Andrew Hillier is CTO and co-founder of CiRBA, Inc., a data center intelligence analytics software provider that determines optimal workload placements and resource allocations required to safely maximize the efficiency of Cloud, virtual and physical infrastructure. Reach Andrew at [email protected]

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@BigDataExpo Stories
The only place to be June 9-11 is Cloud Expo & @ThingsExpo 2015 East at the Javits Center in New York City. Join us there as delegates from all over the world come to listen to and engage with speakers & sponsors from the leading Cloud Computing, IoT & Big Data companies. Cloud Expo & @ThingsExpo are the leading events covering the booming market of Cloud Computing, IoT & Big Data for the enterprise. Speakers from all over the world will be hand-picked for their ability to explore the economic...
SYS-CON Events announced today that Cloudian, Inc., the leading provider of hybrid cloud storage solutions, has been named “Bronze Sponsor” of SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Cloudian is a Foster City, Calif.-based software company specializing in cloud storage. Cloudian HyperStore® is an S3-compatible cloud object storage platform that enables service providers and enterprises to bui...
SAP is delivering break-through innovation combined with fantastic user experience powered by the market-leading in-memory technology, SAP HANA. In his General Session at 15th Cloud Expo, Thorsten Leiduck, VP ISVs & Digital Commerce, SAP, will discuss how SAP and partners provide cloud and hybrid cloud solutions as well as real-time Big Data offerings that help companies of all sizes and industries run better. SAP launched an application challenge to award the most innovative SAP HANA and SAP ...
SYS-CON Events announced today that O'Reilly Media has been named “Media Sponsor” of SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. O'Reilly Media spreads the knowledge of innovators through its books, online services, magazines, and conferences. Since 1978, O'Reilly Media has been a chronicler and catalyst of cutting-edge development, homing in on the technology trends that really matter and spurri...
The Internet of Things (IoT) is going to require a new way of thinking and of developing software for speed, security and innovation. This requires IT leaders to balance business as usual while anticipating for the next market and technology trends. Cloud provides the right IT asset portfolio to help today’s IT leaders manage the old and prepare for the new. Today the cloud conversation is evolving from private and public to hybrid. This session will provide use cases and insights to reinforce t...
Samsung promises to be one of the 800-pound gorillas of the IoT, if its success in recent years with Android devices and other consumer electronics is any guide. Showing its willingness to be a big IoT player, the company recently acquired SmartThings, a recent startup that's developed an open smarthome appliation that currently supports 1,000 devices and 8,000 apps. SmartThings will now work under the auspices of Samsung's Open Innovation Center (OIC). SmartThings Founder and CEO Alex Hawkinson...
What process has your provider undertaken to ensure that the cloud tenant will receive predictable performance and service? What was involved in the planning? Who owns and operates the data center? What technology is being used? How is it being supported? In his session at 14th Cloud Expo, Dave Weisbrot, Cloud Business Manager for QTS, will provide the attendees a look into what it takes to stand up and stand behind a highly available certified cloud IaaS.
I'll be hosting an SAP HANA Cloud webinar at 11am eastern time, Wednesday, October 29. You can sign up now. Featured speakers will be Allan Adler, Managing Partner, Channel Cloud Consulting, and Thorsten Leiduck, VP ISVs & Digital Commerce, SAP. Attendees will learn about • Cloud economics, hybrid cloud strategy, market size and opportunity • Introduction to SAP HANA Cloud Platform and how to: - Build new next-generation applications - Extend on-premise solutions non-disruptively throu...
SYS-CON Events announced today that Gigaom Research has been named "Media Sponsor" of SYS-CON's 15th International Cloud Expo®, which will take place on November 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Ashar Baig, Research Director, Cloud, at Gigaom Research, will also lead a Power Panel on the topic "Choosing the Right Cloud Option." Gigaom Research provides timely, in-depth analysis of emerging technologies for individual and corporate subscribers. Gigaom Research'...
Join both SAP and Channel Cloud Consulting for our webcast and uncover how you can extend your reach to capture a piece of the US$17 billion cloud application services market with SAP. Learn about SAPs market-leading SAP HANA Cloud Platform and an exciting opportunity to join SAPs growing ecosystem of Application Development partners. When: October 29, 11:00am EST Speakers: Allan Adler, Managing Partner, Channel Cloud Consulting Thorsten Leiduck, Vice President ISVs & Digital Commerce, SAP
Application Performance Management (APM) has been bred with all the right elements to give us the insights we need to see the health of our applications. Similar to your most trusted watch dog, it alerts us to anomalies when events occur, providing awareness to the environment that only they can observe. As enterprises embrace the DevOps philosophy, and the coalescence of the Development and Operations continues, I foresee the conditions ripening to foster innovative methods of making applicati...
SYS-CON Events announced today that IBM is holding a Bluemix Developer Playground on November 5, 10:30 am to 5:30 pm at 15th Cloud Expo. 15th Cloud Expo, co-located with @ThingsExpo, Big Data Expo, and DevOps Summit is taking place Nov. 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. The labs, for developers of all levels, will highlight the ease of use of Bluemix, its services and functionality and provide short-term introductory projects that developers can complete betw...
The Industrial Internet revolution is now underway, enabled by connected machines and billions of devices that communicate and collaborate. The massive amounts of Big Data requiring real-time analysis is flooding legacy IT systems and giving way to cloud environments that can handle the unpredictable workloads. Yet many barriers remain until we can fully realize the opportunities and benefits from the convergence of machines and devices with Big Data and the cloud, including interoperability, da...
Software AG helps organizations transform into Digital Enterprises, so they can differentiate from competitors and better engage customers, partners and employees. Using the Software AG Suite, companies can close the gap between business and IT to create digital systems of differentiation that drive front-line agility. We offer four on-ramps to the Digital Enterprise: alignment through collaborative process analysis; transformation through portfolio management; agility through process automation...
How do you know when a technology has become mainstream? A good clue may be when politicians start talking about it on the campaign trail and with mainstream media. David Cameron, the UK prime minister, was the latest, indicating that the world was now on “fast-forward” with the Internet of Things (IoT) ushering in the new industrial revolution. No mention of IoT targeted at the masses would be complete without the clichéd example of the communicating fridge. While it is easy to get caught up in...
In my recent article, “Software Quality Metrics for your Continuous Delivery Pipeline – Part III – Logging,” I wrote about the good parts and the not-so-good parts of logging and concluded that logging usually fails to deliver what it is so often mistakenly used for: as a mechanism for analyzing application failures in production. In response to the heated debates on reddit.com/r/devops and reddit.com/r/programing, I want to demonstrate the wealth of out-of-the-box insights you could obtain from...
The Internet of Things will greatly expand the opportunities for data collection and new business models driven off of that data. In her session at Internet of @ThingsExpo, Esmeralda Swartz, CMO of MetraTech, will discuss how for this to be effective you not only need to have infrastructure and operational models capable of utilizing this new phenomenon, but increasingly service providers will need to convince a skeptical public to participate. Get ready to show them the money! Speaker Bio: ...
This year like last year, XebiaLabs polled Fortune 1000 companies in banking, manufacturing, healthcare, government and IT, interviewing DevOps teams and everyone from QA to C-level suites. More than 1,000 people were asked to share their perspectives on software delivery trends. Last year the survey found that application deployments fail up to 30% of the time and that 75% of managers believe their deployment process deserves a failing grade. This year, the survey revealed little change in at...
Can a postmortem review help foster a curiosity for innovative possibilities to make application performance better? Blue-sky thinkers may not want to deal with the myriad of details on how to manage the events being generated operationally, but could learn something from this exercise. Consider the major system failures in your organization over the last 12 to 18 months. What if you had a system or process in place to capture those failures and mitigate them from a proactive standpoint prevent...
Machine-to-machine (M2M) technology and the resulting Internet of Things are leading us inexorably toward everything-as-a-service (XaaS). As more things get connected, the range of service opportunities expands. And as those services are presented online, they become available for use, re-use and re-purposing. At first thought, the idea of more connected devices suggests simply that there will be more devices around, and as such, more products for manufacturers to make and sell. That’s true, bu...