Design VM deployments by leveraging availability sets, fault domains, and update domains in Azure; use web app for containers; design VM Scale Sets; design for compute-intensive tasks using Azure Batch; define a migration strategy from cloud services; recommend use of Azure Backup and Azure Site Recovery
Design VM deployments by leveraging Availability Sets, Fault Domains, Update Domains, and Availability Zone
This part of the exam objective aims to measure our understanding and capability in designing Azure IaaS – not explicitly mentioned in this section. So before jumping right into other topics, let’s review this Azure service.
Azure VM is the IaaS service offering of Microsoft where customers can deploy and fully manage virtual machines as if they were running them on-premises but without needing to put together the necessary infrastructure (i.e., hardware, hypervisor, etc.) to host them. However, Azure VM requires these additional Azure resources.
In contrast to creating VMs on-premises where we manually set the number of each VMs’ resources, such as CPU, RAM, and storage, we have to choose the VM Size (VM types for Windows VMsand Linux VMs) whenever we create virtual machines in Azure which will then determine the VM’s resources.
There’s a limit to the number of VMs per region, which can be raised by filing a support ticket requesting an increase. On another note, here are some Azure VM concepts that exam takers should be familiar with, and that section contains some best practices concerning certain Azure resources.
Azure VMs are priced per hour of usage, and there are price differences if the VM is reserved for three years, one year, or on Pay-as-you-go model. The disk storage pricing model applies to the disks – both Unmanaged Disks, and Managed Disks – used by the VM and is separate from the VM fee itself.
To ensure availability of the VMs, we need to leverage Availability Sets, Fault Domains, Update domains, and Availability Zones. Availability Sets has two constructs under it: Fault Domains and Update Domains. We can consider Fault Domains as logical groups of VMs that could be affected if a power source, network switch, or rack in a data center goes down, while Update Domains are logical groups of VMs that could be patched and rebooted together when Microsoft conducts its maintenance activities. Refer to the table below to further clarify the concept:
Considering the chart above, if a rack that houses Fault Domain 0 loses power HR Web 01 and HR DB 01 would become unavailable. However, this issue would be transparent to the HR system users since HR Web 02, HR App 01, and HR DB02 are still up to host our theoretical 3-tiered HR system.
Unfortunately, if faulty network equipment causes Fault Domain 2 to lose its entire network connection, HR App 01 will be unavailable which will result to HR system outage. In the same vein, an outage could be expected if Microsoft patches and reboots the machines inside Update Domain 2, again, because our lone HR App server will be bounced.
Availability zone, a relatively new feature, expands the level of control we have to maintain the availability of our applications and data by providing data center-level failure protection in a supported region.
- We can assign a VM to an Availability Set during VM creation only.
- VMs can belong to only on Availability Set.
- It’s recommended to separate workloads into their own Availability Sets, and have at least two VMs of the same workload per Availability Set.
Use Web App for Containers
Before we go into Web App for Containers, let’s understand what is Azure Web App Service first for the benefit of those who are not much into Dev/Ops conversations like myself.
Azure Web App Service is a PaaS solution from Microsoft that enables developers to build web and mobile applications. It supports several programming languages such as .NET, Node.js, Java, Ruby, Python, and PHP, with support for continuous development via Visual Studio Team Services (VSTS), GitHub, Docker Hub, and Azure Container Registry. Here’s a sample scenario:
- A team can code and develop a web or a mobile app using their preferred coding language such as .NET, Node.js, Java, Ruby, Python, and PHP.
- They can use code-collaboration platforms such as VSTS, GitHub, Docker Hub, and Azure Container registry, where they can upload their app codes and continuously modify/update them as necessary.
- Once the app is completed, the team can upload the code to Azure Web App Service to host, run, and make the app available to customers.
- Most apps get updated or corrected for bugs regularly. The team can go back to the code-collaboration platform, update the app, and commit the changes. The team can have Azure Web App Service pull the changes, and either park them under a staging instance (recommended) or immediately update the app.
During the process of getting an Azure Web App Service, the team simply needs to select a few options like the App Service Plan, and image (.NET, Node.js, PHP, Ruby, etc) that best suit their app, and Azure will automatically take care of the rest like putting the infrastructure resources and components together. The team can also choose to enable auto-scale up/down or scale-out under certain conditions. Here’s a diagram that could help us further our understanding of this service:
- Leveraging Microsoft Azure starts with a Subscription and a Resource Group.
- We can then choose the appropriate Web App Service Plan that’s made up of infra resources, virtual machine, and other components in an image.
- Our Web App code is deployed within the VM. Microsoft provides some tools to tweak app settings and run background jobs.
- Routing traffic will be automatically handled by a load balancer (using a routing mechanism of our choice) upon scaling the application to multiple instances.
As we can see from the previous diagram, an App in Azure App Service runs in an App Service Plan, which we could consider as a pool of compute resources that are generally charged per hour of use (except for Azure Functions). The pricing tier of the App Service Plan determines the available features that we can use. If we need more features such as custom domain, hybrid connectivity, and network isolation, we can change the tier of the App Service Plan.
Some organizations have already started developing their applications on-premises, and even adopted container technologies to host them, which can be also referred to as “containerized-applications.” Now, instead of bringing your app codes to Azure Web App Services, we can lift-and-shift the whole container to Azure Web App for Containers.
Design VM Scale Sets
This Azure service allows us to easily and automatically deploy, scale, and manage a set of identical virtual machines (supports Windows and Linux). This set may automatically or manually scale depending on the resource (CPU, RAM, network) usage and the level of autonomy that we set. An Azure load balancer, which is automatically created with the Scale Set, distributes the traffic to the VMs in this set using RR or user-defined NAT rules. One important thing to keep in mind when using Scale Sets is that the VMs in this set are all identical and intended to perform the same function.
Taken from the official Azure VM Scale Sets planning and design guide, we may choose to use this service for these Scale Sets-specific features:
- Once we specify the scale set configuration, we can update the “capacity” property to deploy more VMs in parallel. This method is much simpler than writing a script to orchestrate deploying many individual VMs in parallel.
- We can use Azure Autoscale to scale a Scale Set automatically but not individual VMs.
- We can reimage scale set VMs but not individual VMs.
- We can specify an upgrade policy to make it easy to roll out upgrades across VMs in our scale set. With individual VMs, we must orchestrate updates yourself, or we do it using SCOM.
Like several resources under the Azure Compute category, Scale Sets can be in either Managed Disks or Unmanaged Disks. Here’s a brief comparison between the two:
- Azure Storage Accounts don’t need to be pre-created when using Managed Disks.
- We can attach data disks for the VM in the Scale Set using Managed Disks.
- Scale Sets on Managed Disk can scale up to 1,000 VMs, as opposed to 20 VMs (recommended to be spread to 5 Storage Accounts) for Unmanaged Disks.
Azure VM Scale Sets is free, and we are charged only for the IaaS components that we use (see Azure VM Pricing).
Design for compute-intensive tasks using Azure Batch
In brief, Azure Batch is a good solution for pre-defined tasks that require high-compute processing. For example, if we need to extract every frame from numerous video files, a decently specced machine would still likely take a long time to complete this task. However, a pool of servers, say a hundred of them, concurrently processing those videos would give significantly faster results. Instead of worrying about massive CAPEX and OPEX from having such scale of servers, we can leverage Azure Batch to provision this pool of servers, scale them as we see fit, run the tasks at hand, and then automatically shut down the servers.
Azure Batch is also a good fit for other workloads such as Message Passing Interface (MPI), a scale of container workloads, rendering multi-media files, extracting data via OCR, and Data processing with Batch and Data Factory.
There are two concepts to keep in mind with Azure Batch. First is Pools which we already touched on previously, and Jobs which comprise of one or more tasks. In designing and provisioning Azure Batch, it’s safe for engineers to assume that a server handles a task. So if you know that one video rendering task, for example, is going to take 3 minutes on an N-series Azure VM, and that’s your processing time limit, then your pool needs 100 N-series Azure VMs to render 100 video files.
Azure Batch service is free, and customers only need to pay for the Azure VMs that make up the pool. Related to this topic, customers can upload and use their own VM image, or use Azure Images that come with the OS and the Applications (Maya, V-Ray, 3ds Max, Arnold, or FFMPEG) on a pay-per-use model.
Take note that all of the servers in the same pool have identical characteristics. Create another pool if different compute characteristics are required. We can have massive pools of servers that process an enormous amount of tasks, but we have to keep in mind our quotas and limits.
Define a migration strategy from cloud services
Let’s start with cloud migration overview from Microsoft. From experience and hearing what others had gone through, here are some items to consider and a few questions to address, generally speaking, when we’re deciding to migrate to and use cloud services:
- Discover and meticulously catalog your IT resources and workloads. This step is critical whether we intend to migrate part or the whole DC to Azure to avoid missing out anything, which could be a royal pain when migration is underway. Consider the compute, network, applications, and operations of our infrastructure even if we intend to migrate just the storage part, for example, to make sure dependencies are not overlooked.
- Thoroughly assess your existing infrastructure with helpful tools. Making a comprehensive catalog of our current IT resources should help us in deciding which of them are best moved to the cloud. However, just because it seemingly makes sense to move them to the cloud doesn’t necessarily mean we could readily do so. Some components could heavily sway the course of this initiative, like performance requirements, custom operations, and functions, network architecture, etc. With the help of some tools like those listed below, we can further evaluate our existing infrastructure for their readiness with our organization’s initiative to leverage cloud services:
○ Azure Virtual machine Readiness Assessment
○ Microsoft Assessment and Planning Toolkit
○ Azure Migrate
- Understand different cloud services. After auditing and assessing our existing infrastructure, we can move on to find out which cloud services we should take on. At this stage it’s best to work with a specialist from Azure or the cloud provider team(from sales, licensing and technical) to understand every bits and pieces of the service(s) we would subscribe to like the required components and licenses, performance, ease of management and configuration, cost, etc.
- Last but definitely not the end of the line, is to migrate. Do we take baby steps, or we lift and shift? Do we have the skill sets to do it our own, or we need to engage a partner? Do we need some cloud-operation training?
Recommend use of Azure Backup
Azure Backup is an Azure-based backup and recovery service that can protect our files, folders, computers, servers, and applications by backing them up and recovering them on-prem or in the cloud.
Here are some differentiators of Azure Backup compared to traditional on-premises backup and recovery solutions/systems (BRS):
- Auto-management of unlimited storage – easily scale up / out storage in a pay-as-you-use OPEX model.
- Several storage replication options – take advantage of LRS, ZRS, GRS, RA-GRS.
- Cost-effective storage for long-term data retention or archiving.
- Data encryption – not a differentiator but worth mentioning that Azure Backup supports encryption of data in-flight and at rest.
Azure Backup comes in different models to cater to varying requirements:
Whichever Azure Backup variant we chose, Azure Backup is priced similarly across this set of backup services; that is, we pay per protected instance and the storage we use for those backup images (except for SC DPM which requires additional licenses).
Here’s a list of Linux distros endorsed by Microsoft Azurewhich are supported by Azure VM backup.
Azure Site Recovery
As part of business continuity and disaster recovery (BCDR), Azure Site Recovery (ASR) service could keep business applications and workloads running during outages by ensuring that they are replicated to another site and that those replicated workloads could failover/failback as deemed necessary. The replication between sites could be any of the following modes:
- On-premise primary site with both physical machines and VMs (supports Hyper-V and VMware) to Azure or secondary on-premise site.
- Azure VMs to another Azure region.
VMware and/or Physical Machine to Azure Replication Architecture
VMware to Secondary Site VMware Replication using Azure Site Recovery
Hyper-V to Azure Replication Architecture
Hyper-V with VMM to Azure Replication Architecture
Hyper-V to Secondary Site Hyper-V Replication using ASR
ASR with Traffic Manager (Priority routing)
Note that most of this concept also apply to Azure to Azure ASR.
ASR with Traffic Manager (Weighted routing)
ASR with Traffic Manager (nested Geo and Primary routing)
- The top Traffic Manager immediately handles the incoming traffic and redirects it to the appropriate nested Traffic Manager. For example, incoming traffic from Germany is redirected to nested-Traffic Manager of Germany (not World).
- Nested Traffic Managers will then redirect the incoming traffic to an endpoint based on Priority routing.
- Note that if we use Geo routing alone, traffic originating from Germany will not be redirected to World if the endpoint for Germany is down.
- Creating Backup or Site Recovery service requires a new Recovery Services vault (cannot use existing Recovery Services vault)
- Cannot protect Azure VMs of the same zone as the Recovery Services vault
- BitLocker-protected disks are currently not supported
Multi-tenant support for replicating VMware to Azure is through Microsoft Cloud Solutions Provider, and this service has three modes: Shared Hosting Services Provider (HSP), Dedicated Hosting Services Provider, and Managed Services Provider (MSP).
Failback can be Original Location Recovery (OLR) or Alternate Location Recovery (ALR). If you failed over a VMware virtual machine, you could fail back to the same source on-premises virtual machine if it still exists. In this scenario, only the changes are replicated again. This scenario is known as original location recovery. If the on-premises virtual machine does not exist, the scenario is an alternate location recovery.
Customers initially incur charges per instance protected by ASR, storage consumed by replicated data, and outbound traffic. When a failover is triggered, the data replicated to ASR becomes compute resources, usual compute charges are added to the costs mentioned above.
We can use Azure Site Recovery Deployment Planner to profile the on-prem infrastructure and get insights to an estimate of resources we will need to replicate it to Azure, on-prem VMs compatibility, and other metrics including estimated cost.
- Physical machines replicated to Azure can only failback as VMs.
- In case of ALR, only failback to VMFS and vSAN will work, not RDM.
- Process Server is not needed to failback to on-premise Hyper-V.
Direct link to summary of workloads supported by ASR.
Go back to MCSE 70-535 Unofficial Study Guide Blueprint