Design solutions using virtual machines
Design VM deployments by leveraging availability sets, fault domains, and update domains in Azure; use web app for containers; design VM Scale Sets; design for compute-intensive tasks using Azure Batch; define a migration strategy from cloud services; recommend use of Azure Backup and Azure Site Recovery
Availability Sets, Fault Domains, Update Domains, and Availability zone
Assigning and leveraging Availability Sets upon creating Azure VMs ensures that they are deployed across multiple hardware nodes, and across multiple logical groups that are affected by maintenance activities.
We can consider Fault Domains as logical groups of VMs that could be affected if a power source, network switch, or rack in a data center goes down, while Update Domains are logical groups of VMs that could be patched and rebooted together when Microsoft conduct its maintenance activities. Refer to the table below to further clarify the concept:
Considering the table above, if a rack that houses Fault Domain 0 loses power HR Web 01 and HR DB 01 would become unavailable. However, this issue will be transparent to the HR system users and they can do business as usual since HR Web 02, HR App 01, and HR DB02 are still up to host our theoretical 3-tiered HR system.
Unfortunately, if a faulty network equipment causes Fault Domain 2 to lose its entire network connection, HR App 01 will be unavailable which will result to HR system outage. In the same vein, an outage could be expected if Microsoft patches and reboots the machines inside Update Domain 2, again, because our sole HR App server will be bounced.
Availability zones, a relatively new feature, expand the level of control we have to maintain the availability of our applications and data by providing data center failure protection in a supported region.
As a review of some Azure concepts, Availability Sets together with its Fault and Update Domains (same goes to the VMs and storage assigned to Availability Sets) are created in one data center in a specified region.
What’s unbeknown to most customers is the exact number of data centers where Azure is hosted per region. That may be the case, Azure is now offering a solution that allows us to host applications in different data centers, which corresponds to the “zones” nomenclature, in a region.
• We can assign a VM to an Availability Set during VM creation only.
• VMs can belong to only one Availability Set.
• It’s recommended to separate workloads into their own Availability Sets, and then have at least two VMs of the same workload per Availability Set.
Web App for Containers
Before we go into Web App for Containers, let’s understand what is Azure Web App Service first for the benefit of those who are not much into Dev/Ops conversations like myself.
Azure Web App Service is a PaaS solution from Microsoft that enables developers to build web and mobile applications. It supports several programming languages such as .NET, Node.js, Java, Ruby, Python, and PHP, with support for continuous development via Visual Studio Team Services (VSTS), GitHub, Docker Hub, and Azure Container Registry. In layman’s term and in listed format:
• A team of programmers can code and develop a web or mobile application using their preferred coding language (.NET, Node.js, Java, Ruby, Python, and PHP).
• The team can use some code-collaboration platform (VSTS, GitHub, Docker Hub, and Azure Container Registry) where they can upload their application codes and continuously modify it there as and when necessary.
• Once the application is completed, the team can upload the code to Azure Web App Service and host / publish / run the application there so that the users or customers can start using it. During the process of getting an Azure Web App Service, the team simply needs to select a few options (like the App Service Plan, and image – .NET, Node.js, PHP, Ruby, etc) that best suit their app, and the rest (like putting the infrastructure resources and components together) will be taken care by Azure. The team can choose to enable auto-scale up/down or scale-out under certain conditions. We will further our understanding about this in a while.
• Since most applications are constantly corrected for bugs, or continuously updated with new features or new look, the team of programmers can modify or append some codes in the code-collaboration platform. Upon committing the changes, Azure can pull the changes from the code-collaboration platform, and either park them under a staging instance (recommended) or update the application immediately.
To further our understanding:
1. Leveraging Microsoft Azure starts with a Subscription and a Resource Group.
2. We can then choose the appropriate Web App Service Plan that’s made up of infra resources, virtual machine, and other components in an image.
3. Our Web App code is deployed within the VM. Microsoft provides some tools to tweak app settings and run background jobs.
4. Routing traffic will be automatically handled by a load balancer (using a routing mechanism of our choice) upon scaling the application to multiple instances.
This is extremely helpful for e-commerce web apps during peak seasons such as holidays or certain time of the day. The web app can scale to manage growing or decreasing number of transactions, which in turn result to using and paying Azure resources that we only need.
Now that we have some understanding of this technology concept, let’s move on to Web Apps for Containers, which isn’t really far off from this.
Some organizations have already started developing their applications on-prem, and even adapted container technologies to host them. This can be also referred to as “containerized-applications.” Now, instead of bringing your app codes to Azure Web App Services, we can lift-and-shift the whole container to Azure Web App for Containers. In reference to the same diagram we have above:
1. We still start with Azure Subscription and a Resource Group.
2. We can choose to bring this containerized-application to a new or existing Web App Service Plan.
3. Instead of uploading our application code via VSTS or connecting to GitHub repo, we choose and specify where Azure will pull the source from – Azure Container Registry, Docker Hub, or Private Registry.
4. The rest of the features, like scale, load balancing, app and job controls remain available.
Azure VM Scale Sets
This Azure service allows us to easily and automatically deploy, scale, and manage a set of identical virtual machines – supports Windows and Linux. This set may auto or manually scale depending on the resource (CPU, RAM, network) usage. An Azure load balancer, which is automatically created with the Scale Set, distributes the traffic to the VMs in this set using RR or user defined NAT rules. One important thing to keep in mind when using Scale Sets is that the VMs in this set are all identical and intended to perform the same function.
Taken from the official Azure VM Scale Sets planning and design guide, we may choose to use this service for these Scale Sets-specific features:
• Once you specify the scale set configuration, you can update the “capacity” property to deploy more VMs in parallel. This is much simpler than writing a script to orchestrate deploying many individual VMs in parallel.
• You can use Azure Autoscale to automatically scale a Scale Set but not individual VMs.
• You can reimage scale set VMs but not individual VMs.
• You can overprovision scale set VMs for increased reliability and quicker deployment times. You cannot do this with individual VMs unless you write custom code to do this.
• You can specify an upgrade policy to make it easy to roll out upgrades across VMs in your scale set. With individual VMs, you must orchestrate updates yourself.
Like several resources under the Azure compute category, Scale Sets can be in either Managed Disks or User-managed Storage. Here’s a brief comparison between the two:
• Azure Storage Accounts don’t need to be pre-created when using Managed Disks
• We can attach data disks for the VM in the Scale Set using Managed Disks.
• Scale Sets on Managed Disk can scale up to 1,000 VMs, as opposed to 20 VMs (recommended to be spread to 5 Storage Accounts) for User-managed Storage.
Design for compute-intensive tasks using Azure Batch
In brief, Azure Batch is a good solution for pre-defined tasks that require high-compute processing. For example, if we need to extract every frame from several video files, this task could probably take a long time by using one machine. We might reckon that running such task could be more efficient if we had a pool of servers – perhaps 100 servers – to concurrently process videos and send the output to a repo of our choosing. Instead of worrying about massive CAPEX and OPEX from such scale of servers, we can leverage Azure Batch to provision this pool of servers, scale them as we see fit, and then run the tasks on demand.
Azure Batch is also a good fit for other workloads such as Message Passing Interface (MPI), Scale of container workloads, rendering multi-media files, extracting data via OCR, and Data processing with Batch and Data Factory
Extracting every frame of a video, even with a very powerful computer can take several minutes not unless you are using quantum level type of machine then that could be reduced to seconds – but then again not everybody would be willing to invest on such device – everything is a balance between money and capability.
There are two concepts to keep in mind with Azure Batch. First is Pools which we already touched on previously, and Jobs which comprise of one or more tasks. In designing and provisioning Azure Batch, it’s common for engineers to assume that a task is handled by a server. So if you know that 1 video rendering task, for example, is going to take 3 minutes on an N-series Azure VM, and that’s your processing time limit, then your pool needs 100 N-series Azure VMs to render 100 video files / tasks.
Customers only need to pay for the Azure resources (i.e. compute, storage, application) that make up the pool of VMs, but Azure Batch service is free. Related to this topic, customers can upload and use their own VM image, or use Azure Images that come with the OS and Applications (Maya, V-Ray, 3ds Max, Arnold, or FFMPEG) on a pay-per-use model.
All of the servers in the same pool have identical characteristics. Another pool should be created if a different compute type is required. We can have massive pools of servers that process enormous amount of tasks but we have to keep in mind our quotas and limits.
Video: Azure Batch Rendering Service
Define a migration strategy from cloud services
Let’s start with cloud migration overview from Microsoft. From experience and hearing what others had gone through, here are some items to consider and a few questions to address, generally speaking, when we’re deciding to migrate to and use cloud services:
• Discover and meticulously catalog your IT resources and workloads.
This is extremely important whether we intend to migrate part or the whole DC to Azure to avoid missing out anything, which could be a royal pain when migration is underway. Consider the compute, network, applications, and operations of our infrastructure even if we intend to migrate just the storage part, for example, to make sure dependencies are not overlooked.
• Further assess your existing infrastructure with helpful tools.
Making a comprehensive catalog of our existing IT resources should help us in deciding which of them we intend or should also move to the cloud. However, just because we want to move them to the cloud doesn’t necessarily mean we could readily do so. Some components could heavily sway the course of this initiative, like performance requirements, custom operations and functions, network architecture, etc.
With the help of some tools like those listed below, we can further assess our existing infrastructure for their readiness with our organization’s initiative to leverage cloud services:
• Understand different cloud services.
After auditing and assessing our existing infrastructure, we can move on to find out which cloud services we should take on. At this stage it’s best to work with the entire Azure or cloud provider team, from sales, licensing and technical, to understand every bits and pieces of the service(s) we would subscribe to – the required components and licenses, performance, ease of management and configuration, cost, etc.
• Last but definitely not the end of the line, is to migrate.
Do we take baby steps, or we lift and shift? Do we have the skill sets to do it our own or we need to engage a partner? Do we need some sort of cloud-operation training?
Recommend use of Azure Backup and Azure Site Recovery
Azure Backup is an Azure-based backup and recovery service that can protect our files, folders, computers, servers, and applications by backing them up or recovering them on-prem or in the cloud.
Here are some differentiators of Azure Backup compared to traditional on-prem backup and recovery solutions/systems (BRS):
• Auto-management of unlimited storage – easily scale up / out storage in a pay-as-you-use OPEX model
• Multi-storage options – take advantage of LRS, ZRS, GRS, RA-GRS
• Cost-effective storage for long-term data retention or archiving
• Data encryption – not really a differentiator but worth mentioning that Azure Backup supports encryption of data in-flight and at rest.
Azure Backup has different flavors, if you will. Some flavors can some, while others can do more, which gives customers the flexibility to choose and pay for what they need. Refer to the table below for more information on this matter:
Whichever Azure Backup variant we chose, Azure Backup is priced similarly across this set of backup services; that is, we pay per protected instance and the storage we use for those backup images (except for SC DPM which requires additional license).
Here’s a list of Linux distros endorsed by Microsoft Azure which are supported by Azure VM backup
Direct link to Azure Backup FAQs.
Direct link to SC DPM with Azure requirements and limitations.
Direct link to Azure Backup Server Prerequisites and Limitations.
Direct link to Azure IaaS VM Backup.
As part of business continuity and disaster recovery (BCDR), Azure Site Recovery (ASR) service could keep business applications and workloads running during outages by ensuring that they are replicated to another site, and that those replicated workloads could failover / failback as deemed necessary. The replication between sites could be any of the following mode:
• On-prem primary site with both physical machines and VMs (supports Hyper-V and VMware) to Azure or secondary on-prem site.
• Azure VMs to another Azure region
VMware and/or Physical Machine to Azure Replication Architecture
VMware to Secondary Site VMware Replication using Azure Site Recovery
Hyper-V to Azure Replication Architecture
Hyper-V with VMM to Azure Replication Architecture
Hyper-V to Secondary Site Hyper-V Replication using ASR
ASR with Traffic Manager (Priority routing)
ASR with Traffic Manager (Weighted routing)
ASR with Traffic Manager (nested Geo and Primary routing)
• Incoming traffic is immediately handled by the top Traffic Manager and redirects it to the appropriate nested-Traffic Manager. For example, incoming traffic from Germany is redirected to nested-Traffic Manager for Germany (not World).
• Nested-Traffic Manager will then redirect the incoming traffic to an endpoint based on Priority routing.
• Note that if we use Geo routing alone, traffic originating from Germany will not be redirected to World if the endpoint for Germany is down.
• Creating Backup or Site Recovery service requires a new Recovery Services vault (cannot use existing Recovery Services vault)
• Cannot protect Azure VMs of the same zone as the Recovery Services vault
• Bitlocker-protected disks are currently not supported
Multi-tenant support for replicating VMware to Azure is through Microsoft Cloud Solutions Provider, and this service has 3 models: Shared Hosting Services Provider (HSP), Dedicated Hosting Services Provider, and Managed Services Provider (MSP).
Failback can be Original Location Recovery (OLR) or Alternate Location Recovery (ALR). If you failed over a VMware virtual machine, you can fail back to the same source on-premises virtual machine if it still exists. In this scenario, only the changes are replicated back. This scenario is known as original location recovery. If the on-premises virtual machine does not exist, the scenario is an alternate location recovery.
Customers initially incur charges for per instance protected by ASR, storage consumed by replicated data, and outbound traffic. When a failover is triggered and data replicated to ASR becomes compute resources, usual compute charges are added to the aforementioned charges.
We can use Azure Site Recovery Deployment Planner to profile the on-prem infrastructure and get insights to an estimate of resources we will need to replicate it to Azure, on-prem VMs compatibility, and other metrics including estimated cost.
• Physical machines replicated to Azure can only failback as VMs.
• In case of ALR, only failback to VMFS and vSAN will work, not RDM.
• Process Server is not needed to failback to on-prem Hyper-V.
Direct link to Azure Site Recovery FAQs.
Direct link to summary of workloads supported by ASR.
Design solutions for serverless computing
Use Azure Functions to implement event-driven actions; design for serverless computing using Azure Container Instances; design application solutions by using Azure Logic Apps, Azure Functions, or both; determine when to use API management service
Design microservices-based solutions
Determine when a container-based solution is appropriate; determine when container-orchestration is appropriate; determine when Azure Service Fabric (ASF) is appropriate; determine when Azure Functions is appropriate; determine when to use API management service; determine when Web API is appropriate; determine which platform is appropriate for container orchestration; consider migrating existing assets versus cloud native deployment; design lifecycle management strategies
Design web applications
Design Azure App Service Web Apps; design custom web API; secure Web API; design Web Apps for scalability and performance; design for high availability using Azure Web Apps in multiple regions; determine which App service plan to use; design Web Apps for business continuity; determine when to use Azure App Service Environment (ASE); design for API apps; determine when to use API management service; determine when to use Web Apps on Linux; determine when to use a CDN; determine when to use a cache, including Azure Redis cache
Create compute-intensive application
Design high-performance computing (HPC) and other compute-intensive applications using Azure Services; determine when to use Azure Batch; design stateless components to accommodate scale; design lifecycle strategy for Azure Batch
Important note: Do not take any of the information here as final and absolute. Consider them as personal collection of information about the given topic that’s available at the time of writing. Please make it a habit to check the latest official documentation, especially for cloud technologies that’s rapidly developed.