Friday, February 04, 2005

The Holy Grail of Five Nines Reliability

Five nines reliability. It's the standard often quoted for traditional telephone service, a goal for new competitors, and a bragging point for equipment vendors. But what does it really mean? Just how close to perfect is five nines anyway?

The spoken term five-nines refers to the number 99.999%. Count the nines. This number is generally referred to as a reliability figure. Actually, what people call reliability is more correctly called "availability." It's not just how often a piece of equipment has a software crash or a power supply that bursts into flames that is really important. It's how much of the time you actually get to use it. In other words, how much of the time is this particular device available. Availability includes how often it breaks and then how fast it gets put back into service. You also have to include how often it is out of service or unavailable due to routine maintenance.

Here's an example. Say your softswitch has a software glitch that only shows up under obscure combinations of events. When the glitch occurs once every six months, the software crashes and automatically reboots. That takes a minute. Does this switch have five nines uptime? Yes. Now if a power supply smokes once a year and it takes 20 minutes to fix it, that's not good enough for five nines even though the power supply failure occurs less frequently.

Here are some handy numbers to give you perspective on the whole nines issue:

Five nines or 99.999% availability means 5 minutes, 15 seconds or less of downtime in a year.

Or, if you are really ambitious, shoot for six nines or 99.9999% availability, which allows 32 seconds or less downtime per year.

Otherwise, four nines or 99.99% availability allows 52 minutes, 36 seconds downtime per year.

Three nines or 99.9% availability allows 8 hours, 46 minutes downtime per year.

Two nines or 99% availability allows 3 days, 15 hours and 40 minutes downtime per year.

One nine or 9% availability allows over 332 days of downtime per year. That's right. You're only up and running about a month out of the year on average. Good grief.

Zero nines is totally, absolutely useless. It's 100% downtime per year. Perhaps you can get a little something for it from the recycler.

How do you get more nines? Buy the best equipment that's the easiest to repair. Then add redundancy. Highly reliable systems often include multiple power supplies and processors, battery backup, diesel or natural gas generators for longer power outages than batteries can handle, multiple diverse communication lines and extras of whatever else is likely to fail. Buggy software that crashes all the time is going to hurt your reliability. If it goes down a lot and takes a long time to get back online your availability will be hurt also.

One thing to be aware of is the five nines criteria tends to apply to whatever the person quoting it says it applies to. PBX systems that meet five nines availability may only do so for the core system and might not include individual line cards and certainly not the phones themselves and their wiring. If it's REALLY important that you minimize downtime, you need to consider EVERYTHING that can fail and make sure it is backed up and/or very easy to fix.

Let T1 Rex help you find the best prices on bandwidth with the service level agreements you require.

Click to check pricing and features or get support from a Telarus product specialist.




Follow Telexplainer on Twitter