Most folks probably
remember the play on "to err is human…" proverb when computers first
began to take over, well, everything.
The saying was only partially tongue-in-cheek,
because as we've long since learned the reality is that computers allow us to
make mistakes faster and more often and with greater reach.
One of the statistics used to justify a devops
initiative is the rate at which human error contributes to a variety of
operational badness: downtime, performance, and deployment life-cycle
time.
Human error is a
non-trivial cause of downtime and other operational interruptions. A recent Paragon
Software survey found
that human error was cited as a cause of downtime by 13.2% of respondents.
Other surveys have indicated rates much higher. Gartner analysts Ronni J.
Colville and George Spafford in "Configuration
Management for Virtual and Cloud Infrastructures" predict as much as 80% of outages
through 2015 impacting mission-critical services will be caused by "people
and process" issues.
Regardless of the actual rates at which human
error causes downtime or other operational disruptions, reality is that it is a
factor. One of the ways in which we hope to remediate the problem is through
automation and devops.
While certainly an appropriate course of
action, adopters need to exercise caution when embarking on such an initiative,
lest they codify incomplete or inefficient processes that simply promulgate
errors faster and more often.
Something that all too often seems to be
falling by the wayside is the relationship between agile development and agile
operations. Agile isn't just about fast(er) development cycles, it's about
employing a rapid, iterative process to the development cycle. Similarly,
operations must remember that it is unlikely they will "get it right"
the first time and, following agile methodology, are not expected to. Process
iteration assists in discovering errors, missing steps, and other potential sources
of misconfiguration that are ultimately the source of outages or operational
disruption.
An organization that has experienced outages
due to human error are practically assured that they will codify those errors
into automation frameworks if they do not take the time to iteratively execute
on those processes to find out where errors or missing steps may lie.
It is process that drives continuous delivery
in development and process that must drive continuous delivery in devops.
Process that must be perfected first through practice, through the application
of iterative models of development on devops automation and orchestration.
What may appear as a tedious repetition is
also an opportunity to refine the process. To discover and eliminate
inefficiencies that streamline the deployment process and enable faster time to
market. Inefficiencies that are generally only discovered when someone takes
the time to clearly document all steps in the process – from beginning (build)
to end (production). Cross-functional responsibilities are often the source of
such inefficiencies, because of the overlap between development, operations,
and administration.
The outage of Microsoft’s cloud service for
some customers in Western Europe on 26 July happened because the company’s
engineers had expanded capacity of one compute cluster butforgot to make all
the necessary configuration adjustments in the network infrastructure.
Applying an agile methodology to the process
of defining and refining devops processes around continuous delivery automation
enables discovery of the errors and missing steps and
duplicated tasks that bog down or disrupt the entire chain of deployment tasks.
We all know that
automation is a boon for operations, particularly in organizations employing
virtualization and cloud computing to enable elasticity and improved
provisioning. But what we need to remember is that if that automation simply
encodes poor processes or errors, then automation just enables to make mistakes
a whole lot faster.
Take care to pay attention to process and to
test early, test often.
No comments:
Post a Comment