Choosing the right size for an instance in the cloud is – by
design – like choosing the right mix of entrĂ©e and sides off the menu. Do I want a double burger and a
small fry? Or a single burger and a super-size fry? All small? All large?
Kiddie menu? It all depends on how hungry you
are, on your capacity to consume.
Choosing the right instance in cloud computing environments
also depends on how hungry your application is, but it also depends heavily on
what the application is hungry for. In order to choose the right “size”
server in the cloud, you’ve got to understand the consumption patterns of your
application. This is vital to being able to understand when you want a second,
third or more instance launched to ensure availability whilst simultaneously
maintaining acceptable performance.
To that end, it is important to
understand the difference between “concurrency” and “connections" in terms
of what they measure and in turn how they impact resource consumption.
Connections is the measure of new
connections, i.e. requests, that can be handled (generally per second) by a
“thing" (device, server, application, etc…). When we’re talking
applications, we’re talking HTTP (which implies TCP). In order to establish a
new TCP connection, a three-way handshake must be completed. That means three
exchanges of data between the client and the “thing” over the network. Each
exchange requires a minimal amount of processing. Thus, the constraining
factors that may limit connections is network and CPU speed. Network speeds
impact the exchange, and CPU speed impacts the processing required. Degradation
in either one impact the time it takes for a handshake to complete, thus
limiting the number of connections per second that can be established by the
“thing.”
Once a connection is established, it
gets counted as part of concurrency.
Concurrency is the measure of how many
connections can be simultaneously maintained
by a thing (device, server, application, etc… ). To be maintained, a connection
must be stored in a state table on the “thing”, which requires memory.
Concurrency, therefore, is highly dependent on the amount of RAM available on a
“thing” and is constrained by RAM available on a “thing”.
In a nutshell: Concurrency requires
memory. New connections per second requires CPU and network speed.
Capitalizing on Capacity
Now, you may be wondering what good
that does you to know the difference. First, you should be (if you aren’t)
aware of the usage patterns of the application you’re deploying in the cloud
(or anywhere, really). Choosing the right instance based on the usage pattern (connections
heavy versus concurrent heavy) of the application can result in spending less
money over time by choosing the right instance such that the least amount of
resources is wasted. In other words, you’re making more efficient the resources
being used by pairing it correctly to the right application.
Choosing a high-memory, low-CPU
instance for an application that is connection-oriented can lead to
underutilization and wasted investment, as it will be need to be scaled out
sooner to maintain performance. Conversely, choosing high-CPU, low-memory
instances for applications dependent on concurrency will see performance
degrade quickly unless additional instances are added, which wastes resources
(and money). Thus, choosing the right instance type of the application is
paramount to achieving the economy of scale promised by cloud computing and
virtualization. This is a truism whether you’re choosing from a cloud
provider’s menu or your own.
It’s inevitable that if you’re going
to scale the application (and you probably are, if for no other reason than to
provide for reliability) you’re going to use a load balancing service.
There are two ways to leverage the capabilities of such a service when delivered
by an application delivery controller that depend, again, upon the application.
STATEFUL
APPLICATION ARCHITECTURE
If the application you are deploying
is stateful (and it probably is) then you’ll not really be able to take
advantage of page routing and scalability domain design patterns. What you can
take advantage of, however, is the combined memory, network, and processing
speed capabilities of the application delivery controller.
By their nature, application
delivery controllers generally aggregate more network bandwidth, contain more
efficient memory usage, and are imbued with purpose built protocol handling
functions. This makes them ideal for managing connections at very high rates
per second. An application delivery controller based on a
full-proxy architecture,
furthermore, shields the application services themselves from the demands
associated with high connection rates, i.e. network speed and CPU. By
offloading the connection-oriented demands to the application delivery service,
the application instances can be chosen with the appropriate resources so as to
maximize concurrency and/or performance.
STATELSS
or SHARED STATE APPLICATION ARCHITECTURE
If it’s the case that the
application is stateless or the application shares state (usually via session
stored in a database), you can pare off those functions that are
connection-oriented from those that are dependent upon concurrency. RESTful or SOA-based application architectures will
also be able to benefit from the implementation of scalability domains as it
allows each “service” to be deployed to an appropriately sized instance based
on the usage type – connection or concurrency.
An application delivery service
capable of performing layer 7 (page) routing can efficiently sub-divide an
application and send all connection-heavy requests to one domain (pool of
resources/servers) and all concurrent-heavy requests to yet another. Each pool
of resources can then be comprised of instances sized appropriately – more CPU
for the connection-oriented, more memory for the concurrent-oriented.
In either scenario, the use of TCP
multiplexing on the application delivery controller can provide further
mitigation of the impact of concurrency on the consumption of instance
resources, making the resources provisioned for the application more efficient
and able to serve more users and more requests without increasing memory or
CPU.
What is TCP Multiplexing?
TCP multiplexing is a technique used
primarily by load balancers and application delivery controllers (but also by some stand-alone web application acceleration
solutions) that enables the device to "reuse" existing TCP
connections. This is similar to the way in which persistent
HTTP 1.1 connections work in that a single HTTP connection can be used to
retrieve multiple objects, thus reducing the impact of TCP overhead on
application performance.
TCP multiplexing allows the same
thing to happen for TCP-based applications (usually HTTP / web) except that
instead of the reuse being limited to only 1 client, the connections can be
reused over many clients, resulting in much greater efficiency of web servers
and faster performing applications.
Regardless of your strategy,
understanding the difference between concurrency and connections is a good
place to start determining how best to provision resources to meet specific
application needs. Reducing the long-term costs associated with scalability is
still important to ensure an efficient IT organization, and utilization is
always a good measure of efficiency. The cost differences between CPU-heavy and
memory-heavy instances vary from cloud to cloud environment, with the most
expensive being those that are high in both CPU and memory. Being able to
provision the least amount of resources while achieving optimal capacity and
performance, is a boon – both for IT and the business it serves.

No comments:
Post a Comment