As an information technology services provider, I am occasionally asked by customers (and potential customers) about response time or about uptime guarantees. In part, they have been conditioned to ask these questions by sales and marketing; in part, they’re simply trying to get some sort of hard number they can use to compare different services or to give them a comfort level with a vendor.
I’m going to let you in on a little secret: Guaranteed response times and uptime agreements are generally useless and often misleading. No guarantee is worth anything without some form of compensation if the guarantee is not met. Companies that do offer money-back guarantees have either set the level of their guarantee so low that they can’t help but meet it, or they’re charging you extra to cover the times that they don’t. Don’t tell anyone I told you that.
Let me first address the myths and reality of “guaranteed uptime”. Ask ten small business owners how long their computers can be down (which I have done) and nine will say, “they can’t,” (which I have heard). What they’re really saying, but trying hard not to say, is, “I don’t know.” The truth is that very few small businesses can measure their tolerance for downtime.
Some can. A printing company, for instance, that gives a guaranteed four-hour turn-around time can’t afford for their primary systems to be down for more than a couple of hours, if that. Without that sort of uptime, they can’t meet their service level agreements to their own customers. Most companies don’t have this requirement. That’s not to say that other companies can be down forever, but determining an absolute uptime goal becomes much more difficult.
There are two factors to consider. First is the productivity cost of downtime. This increases, linearly, from zero, as downtime increases. The longer your systems are down, the less productive your employees can be. Some company’s employees can do nothing without their computers; others hardly rely on them at all. A brokerage firm, for instance, loses a lot more money per minute of downtime than a light manufacturing company. Most everyone else is somewhere in the middle of the two.
The second factor is the infrastructure cost of eliminating downtime. By “infrastructure” I mean the cost of systems, backups, failover systems, management, monitoring, and all the things that go into mitigating downtime risk. (Notice I didn’t say, “eliminating downtime risk.” If anyone offers 100% uptime, read the fine print.) Infrastructure cost is very low if you’re willing to put up with a lot of downtime. As you approach 100% uptime, the cost to maintain that uptime increases exponentially.
The goal is to find that “sweet spot” where the two lines cross. This is where the infrastructure cost is justified by the productivity cost savings. As you can see, this is becoming less about “uptime” as a number of hours and more about achieving maximum reliability within certain cost boundaries.
Reliability is an Investment
Achieving maximum reliability, how is that accomplished? To illustrate, I’ll use a simple analogy—an automobile. Maximum reliability with the family car means it is drivable as often as possible. If it’s on the side of the road with a flat tire, or unable to start, or in the shop, it’s not drivable. Maintaining a reliable car starts with choosing the right vehicle. (A 1985 Yugo is not the place to start.) If you begin with something reliable it’s going to cost a lot less to keep it up. Next, you have regular monitoring and maintenance: Keep gas in the tank, change the oil, check the tire pressure and wear, don’t just cover over the “Check Engine” light with a piece of duct tape, and so on.
Notice that nowhere in that list did I mention a really fast mechanic. Of course you need a competent mechanic to fix problems that will arise, but, assuming he knows what he’s doing it doesn’t matter if he can do it an hour faster than the next guy. The way to ensure your car is drivable as much as possible is to keep it out of the shop in the first place.
It’s the same with a computer network. You don’t achieve uptime with response time. You achieve uptime by putting in place the infrastructure necessary to avoid the problem in the first place. Even if your support company was next door, and they could start working on every problem within minutes, that response time would be useless if you continued to have major problems every day.
You can avoid most problems with monitoring, maintenance, and management of your systems. Obviously, not every problem is avoidable, and you still need a competent support team, but, if you chose the right company, the number and frequency of problems should be few and far-between.