- Lori
MacVittie, senior technical marketing manager at F5 Networks (www.f5.com),
says:
Last time
we dove into a "Load Balancing 101" discussion we looked at the
difference between architected for scale and architected for fail. The question
that usually pops up after such a discussion is "why can't I just
provision an extra server and use it. If one fails, the other picks up the
load"?
We call
such a model N+1 – where N is the number of servers necessary to handle load
plus one extra, just in case. The assumption is that all N+1 servers are
active, so no resources are just hanging out idle and wasting money. This is
also sometimes referred to as "active-active" when such architectures
include a redundant pair of X (firewalls, load balancers, servers, etc… )
because both the primary and backup are active at the same time.
So it
sounds good, this utilization of all resources and when everything is running
rosy it can benefit in terms of improving performance, because utilization
remains lower across all N+1 devices.
The
problem comes when one of those devices fails.
HERE COMES the MATH
In the
simplest case of two devices – one acting as backup to the other – everything
is just peachy keen until utilization is greater than 50%.
Assume we
have two servers, each with a maximum capacity of 100 connections. Let's assume
clients are generating 150 connections and a load balancing
service distributes this evenly, giving each server 75 connections for a
utilization rate of 75%.
Now let's
assume one server fails.
The
remaining server must try to handle all 150 connections, which puts its
utilization at … 150%. Which it cannot handle. Performance degrades,
connections time out, and end-users become very, very angry.
Which is
why, if you consider the resulting impact of performance and downtime on
business revenue and productivity, redundancy is considering a best practice
for architecting data center networks. N+1 works in the scenario in which only
1 device fails (because the idle one can take over) but the larger the pool of
resources, the more likely it is that more than one device will fail at
relatively the same time. Making it necessary to take more of an N+"a
couple or three spares" approach.
Yes,
resources stand idle. Wasted. Money down the drain.
Until
they're needed. Desperately.
They're
insurance, they always have been, against failure. The cost of downtime and/or
performance degradation was considered far greater than the operational and
capital costs associated with a secondary, idle device.
The
ability of a load balancing service to designate a backup
server/resource that remains idle is paramount to enabling architectures built
to fail. The ability of a load balancing service in the cloud to do this should
be considered a basic requirement. In fact, much like leveraging cloud as a
secondary "backup" data center for disaster recovery/business
continuity strategies, having a "spare" resource waiting to assure
availability should be a no-brainer from a cost perspective, given the much
lower cost of ownership in the cloud.
No comments:
Post a Comment