- Lori MacVittie, senior technical
marketing manager at F5 Networks (www.f5.com), says:
The introduction of virtualization
and cloud computing to data centers has been heralded as “transformational” and
“disruptive” and “game changing.” From an operational IT perspective, that’s
absolutely true.
But like transformational innovation
in other industries, such disruption is often not in how the core solution is
leveraged or used, but how it impacts operations and the broader ecosystem,
rather than the individual tasked with using the solution. The transformation
of the auto-industry, for example, toward alternative fuel-sourced vehicles is
disruptive and changes much about the industry. But it doesn’t change the way
you drive a car; it still works on the same principles and the skills you’ve
learned driving gas powered cars are still applicable to alternative
fuel-source cars.
What changes for the operator – just
as within IT - is there may be new concerns with which you must contend.
Load balancing virtualized
applications is in this category. While the core principles you’ve always
applied to load balancing applications still applies, there are a few
additional concerns that arise from the use of virtualization that you’re going
to have to take into consideration.
LOAD BALANCING 101 REFRESH
Let’s remember quickly how load
balancing traditional applications works, shall we?
The load balancing service presents
to the end-user a single endpoint, i.e. “the application”. Users communicate
exclusively with that endpoint. The load balancing service communicates with a
pool of resources comprised of one or more application instances. It is
by adding instances to the pool that an application is able to scale
horizontally to meet demand.
In the most common traditional load
balancing environment, each application instance is hosted on a single,
physical server. The availability of the “application” is maintained by
insuring there are always enough instances (nodes) available to compensate for
any failures that might occur at the physical server, operating system,
platform, or application layers.
Load balancing services also allow
for the designation of “back up” nodes. Each node in a pool may have a back up
node that is only activated in the event of a failure. This is used primarily
for high-availability purposes to ensure continuous application availability
rather than for scaling purposes.
Now, when we replace the physical
servers with virtual servers, we have pretty much the same system. There still
exists a pool of resources that comprise “the application”, the load balancing
service still mediates for the end-user, and there are still enough application
instances in the pool to compensate for failure, thus ensuring availability of
“the application.”
However, there are some new
potential sources of failure that must be addressed that impact the topology –
the physical placement – of the application instances in the pool.
TWO RULES for LOAD BALANCING VIRTUALIZED APPLICATIONS
One of the most important changes
coming from virtualization that must be addressed is fault isolation. Assume
for a moment that we took all four physical nodes and consolidated them on a
single, physical virtualized platform.
In theory, nothing changes. The load
balancing service views a “node” as a unique combination of IP address and TCP
port, and whether that’s hosted on a virtual platform or a physical server is
irrelevant to the load balancing service. The load balancing algorithms still
work the same way, nodes are selected as directed by configured policies,
backup nodes are still used to ensure continuous availability, and nothing
about the way in which load balancing works changes.
But it’s very relevant to operations
because this type of server-consolidated deployment model introduces higher
unrecoverable failure scenarios and it will directly impact the
performance (in a bad way) of “the application.”
There are a couple operational
axioms at work here:
1. Shared infrastructure (network,
compute, storage) means shared risk.
2. As load increases, performance
decreases.
Let’s say “Node 1” fails. In both
the physical and virtual deployments, the load is simply shifted to the remaining
active nodes. No problem.
But what if the network
connectivity between the load balancing service and “Node 1” fails? In a
physical deployment, no problem – each node has its own physical connection and
is unlikely to impact the other nodes. But what about the virtual deployment?
Each node has its own virtual network connection, certainly, but does it have
its own physical network connection or is it shared? If it’s a shared physical
connection and it fails, then all nodes will fail – leaving “the
application” unavailable.
Load
Balancing Virtualized Applications Rule #1: Team
and Trunk.
Physical network redundancy is a
must. Modern server platforms are generally enabled with at least 2 if not 4
GBE connections, use them.
So now you’ve got your network
topology designed to ensure that a physical failure will not take out every
application instance on the server. Next you need to consider how the
application instances are isolated and deployed to ensure that a failure at the
hypervisor layer does not disrupt all application instances.
Consider that there are two possible
reasons you are implementing load balancing: scalability and availability. In
the former, you’re trying to ensure supply meets demand. In the latter, you’re
trying to mitigate potential failure in a way to ensure “the application” is
always available, regardless of failure. If there is a failure at the
hypervisor layer, all instances relying on that hypervisor will be impacted
(and not in a good way). Regardless of why you’re implementing load balancing,
the result of such a failure is the same, instances are unavailable. Similarly,
if the physical device on which virtualized applications are deployed fails,
every instance on that device will be down.
In both cases, if all your virtual
eggs are in one basket and there’s a failure at the hypervisor layer, you’re in
trouble.
Load
Balancing Virtualized Applications Rule #2: Divide
and Conquer.
Application instance redundancy is a
must. Never put all your application instances on a single virtualized or
physical platform. Spread them across at least two, to isolate potential
failures in the virtualization layer or at the physical server layer.
Node backups should always be
located on physically separate devices. Load balancing services are adept at
discerning failure but they are not necessarily able to determine the source. A
failure to communicate with an application instance could be caused by a bad
cable, a failed port, an unresponsive network stack, or an application error.
The load balancing service knows the application instance is down, but not
necessarily why it’s down. If it’s a crashed instance, then failing over
to a back up instance on the same server is probably going to work out fine.
But if the root cause is a failed port or bad cable, failing over to a backup
instance on the same server isn’t going to help – because it is down too.
It is imperative to ensure
availability that there are always at least two of everything – and that means
physical devices, as well. Never put all your eggs in one basket – at any
layer.
THE PERFORMANCE IMPACT
Aside from general availability
issues, there is also the very real possibility that where you deploy
virtualized application instances will impact performance of “the application.”
Remember that even though you can designate CPU and memory on a per application
instance, they still ultimately shared I/O – both storage and network. That
means even if you use rate limiting technologies to try to manage bandwidth consumption
as a means to reduce congestion or latency, ultimately you’re impacting
performance. If you don’t use rate limiting or other bandwidth-focused
solutions to manage the shared network resource, you run the risk of congestion
and increasing latency on the wire.
Similarly, shared storage is even
more problematic because when you trace I/O down through the system, you end up
at a single, shared I/O controller that is going to have some serious
limitations on it. I/O intense application instances deployed on the same
physical device are going to cause contention in the underlying system, which
is going to negatively impact performance.
Again, divide
and conquer. Disperse such instances across two (or more) physical
servers. The number of servers will depend on the overall scale of the
application and the resource consumption rate. Load balancing will be
able to assist in maintaining performance across instances if you take
advantage of a response-time aware algorithm such as fastest response time (the
assumption is that response time correlates directly to load and in most cases,
this is true). This keeps any given instance from becoming overwhelmed.
Ultimately, what this means is that
you have to be a little more aware of physical deployment location for
application instances than you did with pure physical deployments.
Consolidation is a great way to reduce operational and capital expenditures,
but it also means consolidating risk.
LOCATION MATTERS
This is a particularly tough nut to
crack especially when combined with the desire to implement auto-scaling
operations in a more cloud-like environment. The idea that you can leverage
“whatever idle resources” you can find to scale out applications on-demand is
powerful, but it’s also potentially fraught with risk if you’re unable to
control placement at all. While the possibility that every instance would end
up deployed on a single server or even a select handful of servers is minimal,
there is the possibility that multiple instances could be deployed in a way
that means a single server failure could eliminate a sizeable number of
application instances, resulting in an unacceptable degradation of performance
or even downtime for some percentage of users.
In the end, location really does
matter when it comes to load balancing virtualized applications. Where they
are deployed and in what groupings becomes a critical factor for maintaining
performance and availability. The tendency to increase VM density is high, but
that tendency can lead to highly disruptive situations in the event of a failed
component. Be aware that cost savings from mass-consolidation and “high
efficiency” through increasing VM density metrics may look good now, but may
not look so good through the lens of hindsight.
No comments:
Post a Comment