What is high availability?
High availability refers to a system or component that is operational without interruption for long periods of time.
High availability is measured as a percentage, with a 100% percent system indicating a service that experiences zero downtime. This would be a system that never fails. It’s pretty rare with complex systems. Most services fall somewhere between 99% and 100% uptime. Most cloud vendors offer some type of Service Level Agreement around availability. Amazon, Google, and Microsoft’s set their cloud SLAs at 99.9%. The industry generally recognizes this as very reliable uptime. A step above, 99.99%, or “four nines,” as is considered excellent uptime.
But four nines uptime is still 52 minutes of downtime per year. Consider how many people rely on web tools to run their lives and businesses. A lot can go wrong in 52 minutes.
So what is it that makes four nines so hard? What are the best practices for high availability engineering? And why is 100% uptime so difficult?
Availability and downtime
As shown in the table below, the number of nines(availability %) correlates to the system downtime.