Demystifying Cloud Computing
With the growth of cloud computing (and all of the associated hype), it’s difficult for the business to determine how it benefits their value chain. Much of the confusion stems from IT leaders grappling with what cloud computing is, and how to define it. I participated in a conference last year where several CIOs and technology leaders discussed what cloud computing was and, how it is (or would be) leveraged in our respective organizations. What was interesting is that there was no clear or single definition in their minds of what constitutes cloud computing or how it would be leveraged. I think they all had a general idea of the capabilities provided by cloud computing, but it was clear coming out of this conference that there was only a vague definition.
It was also interesting to note that the bombardment of marketing collateral, blogs, and new releases that centered themselves around cloud computing did nothing to provide clarity of the scope, nature, and use of cloud computing. Consequently, it actually contributed to the confusion in the minds of the technology leaders that attended the conference.
Well, a lot has changed in a year, as the practical application and case studies of cloud computing has emerged, I’ve found that it’s much easier to determine how to leverage cloud computing, but the management is still somewhat ill-defined.
At its core, cloud computing is just a service that is consumed where specific delivery mechanisms may or may not be known to the consumer of those services.
So, being a service, it should be subject to service level agreements like any other service that an IT organization would consume. Placing these metrics under the operational framework of an SLA provides continuity in vendor management and metric reporting umbrellas that currently exist to manage any other 3rd party service provider (in this case, for public clouds). For internal clouds, SLAs would provide the same benefit of identifying the expectations that are set for the delivery of those services as well. Either way, you are establishing the framework for quality of service (QoS) that is expected from the service you are consuming.
At its core, a service level agreement is a binding agreement that describes the services and how will those services be provided – essentially, establishing a set of delivery expectations. It also defines any third parties that need to be involved with the delivery of the service to you. Within each service provided, you should set expectations for the service provided through quantifiable metrics. In the case of cloud computing, however, visibility into cloud service delivery may not be available, so it is critical that any delivery metrics are clearly spelled out, and more importantly, those metrics aligned to business delivery.
So for each metric, define:
- Description of each metric – an overview of each of the metrics that you will use to measure whether the agreement is successful
- Standard – for each metric you are going to use, what is the criteria for success or expectations for the service
- Calculation method – for each metric, how will that metric be calculated
- Reporting intervals – how often will you expect the metrics to be delivered and reviews
Let’s look at an example of a public cloud offering for software as a service (SaaS) and a metric – say availability. Your users are going to use this service to perform some task (could be anything such as sales management or budget management).
Description
The description of the metric then would be the availability of the service to you as the consumer. Within the description, if there are any caveats that need to be explained such as maintenance windows, they need to be mentioned here (although the maintenance windows would need to be spelled out in the calculations below.
Standard
The standard is the expectation that you are going to set for the availability. In this case it can be expressed in either hours of acceptable outage, or a percentage of total expected time available. Be careful about percentages though as they are always relative and as a result can sometimes introduce ambiguity when disagreements occur – remember there is a BIG difference between 99% availability and 99.9999% availability.
So think through if you are going to use percentage the potential implications of an outage to your business.
Calculation
In this case, calculation would be the sum total of hours that are determined to be available to you and the time interval. If they are saying that their availability is 24x7x365, and if the time interval is monthly, then the hours for each reporting period would be based on the number of hours within that month (as an example, January would have 744 hours of availability).
If there are any maintenance expectations during the period, they need to be spelled out. If they are saying they need 2 hours per week per month to perform system maintenance, then those hours would be removed from the available hours.
Reporting intervals
The last part of defining a metric is how often these metrics will be reported. This establishes how often the service provider will tell you how well their service performed. For most services, monthly intervals are defined as what is used. However, for mission critical or high volume transactional services, this may be weekly, daily, or in some cases hourly.
Typical Data Tracked in an SLA
You would want to include the following in your SLA:
Service Delivery Metrics – these are the metrics that determine the effectiveness of the vendor or internal counterparty for delivering the services they agreed to:
- Availability – the amount of time that the consumed service is made available to you.
- Performance – describes the performance expectations of the service being provided. This metric is particularly important for high-volume transactions.
- Transaction volume – description of the expected volume required to be fulfilled by the service provider
- Security – metrics that describe the level of security maintained by the service provider
Maintenance and Service Requests – the level of service that is provided to you for enhancement and break-fix repair:
- Work order turn around – the expected turn around for work orders, usually expressed in terms of complexity of the work order
- Mean time to repair – for break fix items, how quickly are they expected to address the issue
- Project estimates – the time it will take for them to respond to project estimates
- Priority definitions – how they define their priority definitions to help you determine alignment to internally maintained priority definitions (both incidents and service requests)
Cloud computing presents a new operating framework for IT managers to work with. The underlying challenge here is that not all of the delivery mechanisms will be visible to the organization. Consequently, service expectations need to be clearly defined as before entering into any contract or agreement to consume those services.



Comments