Azure availability set

This is Part 8 of Azure tutorial. In this article, we will understand Azure availability sets with a simple example. Before that let's understand Fault Domains and Update Domains.

Azure Fault Domain

What does an azure dataceneter contain? Well, in simple terms, it contains several racks of servers. Each rack in turn, may contain several servers with it's own power supply and network switch. In reality, server racks are much complicated than this. They may be equipped with redundant power supplies and network switches. However, to keep this example simple, let's just say, a rack contains 15 to 20 physical servers with it's own power supply and network switch.

azure fault domains explained

So you can think, each rack of servers as a separate fault domain. For example, if the power supply or network switch fails in a given rack, only the servers in that rack fails. The rest of the server racks are isolated and unaffected. So you may think of a fault domain as a group of resources that may fail at the same time due to the same root cause. It has a single point of failure. For example, if the power supply fails, all the servers in that rack fails. It is very important we understand the concept of fault domains, because, if we deploy our azure resources like virtual machines for example, in two or more fault domains, they remain available should a failure occur in one of the fault domains.

Azure Update Domain

The server hardware and supporting infrastructure in a datacenter is divided in to multiple fault domains and update domains. An update domain is a group of resources that can be updated and rebooted if required at the same time. From time to time, patches and software updates need to be applied. Some updates require servers to be rebooted. Only one update domain is rebooted at a time. A rebooted update domain is then given 30 minutes to recover before maintenance is initiated on a different update domain. This reduces the downtime to a great extent. So, if you want your azure resources, like virtual machines for example to be available even during the update process, have them deployed across multiple update domians.

Azure Availability Set

azure availability set explained

An Availability Set is a logical grouping for isolating virtual machine resources from each other. Azure makes sure that the VMs we place in an Availability Set run across multiple physical servers, compute racks, storage units, and network switches. If a hardware or software failure happens, only a subset of our VMs are impacted and our overall solution still stays operational. Availability Sets are essential for building reliable cloud solutions.

Let's say we have a simple two tier web application. On one of the virtual machines, we have a web server and on another virtual machine, we have our database server. Now, to be able to handle and process more requests we have 2 web servers and 2 database servers. In real-world, web applications that have lot of demand, for example Google.com, Gmail.com, and Amazon.com may have many many web servers and database servers. However, to keep our example simple, let's just stick to two web servers and two database servers. The load balancer obviously distributes the incoming traffic between the two web servers.

web servers with load balancer

What may happen if availability sets are not used

Well, all the 4 VMs (i.e the two web servers and two database servers) may end up in the same fault domain or update domain. As a result if there is a software failure or hardware failure like power supply or network switch failure, all your web and databaser servers go down. End result your web application is no longer available. If it's an e-commerce application like amazon.com for example, just imagine the extent of loss to the business every second the system is down.

azure fault domain example

Use availability sets for high availability

Since we have two tiers - a web tier and a database tier, we create 2 availability sets - one for the web tier and the other for database tier.

azure availability set example

In Azure, when creating an availability set, we specify the following

create availability set

Name

The name of the of the availaility set. It's a common convention to use the prefix "avail" for availability sets

Region

Azure region where we want the resources to be deployed

Fault domains

The number of fault domains you want in the availability set. For example, if you set the fault domains to 3 and you create 3 virtual machines, each of them will be placed in 3 separate fault domains. If there is a fault like a power failuer for example, only one of the server racks is affected. This means only one of you VM is down, but the other 2 vms from the other 2 fault domains are still available. This in turn means, your workload i.e in this case your web application is still available to end users.

What happens if we create a fourth VM with 3 fault domains. Well, it will be placed in one of the 3 fault domains. This means, in one of the 3 fault domains, you will have 2 VMs and the rest 2 will have 1 each.

azure fault domain availability set

Update Domains

The number of update domains you want. Let's say, you have 3 VMs deployed across 3 update domains. If an update is installed and a restart is required, only one update domain is restarted at any given time. This means you have the other 2 VMs available from the rest of the 2 update domains.

Availability Sets and Virtual Machine SLA

For all Virtual Machines that have two or more instances deployed in the same Availability Set, Microsoft guarantees, you will have Virtual Machine Connectivity to at least one instance at least 99.95% of the time.

You can't add an existing Virtual Machine to an availability set after it's created. So, if you want a virtual machine in an availability set, it's a decision you have to make at the time of creation, not after it is created.