Azure Deployment Design Guide
Azure Deployment Design Guide
December 2022
A Microsoft Company
Azure Deployment Design Guide (V8.5) CONFIDENTIAL
Notices
Copyright © 2022 Microsoft. All rights reserved.
This manual is issued on a controlled basis to a specific person on the understanding that no part of
the product code or documentation (including this manual) will be copied or distributed without prior
agreement in writing from Metaswitch Networks and Microsoft.
Metaswitch Networks and Microsoft reserve the right to, without notice, modify or revise all or part of
this document and/or change product features or specifications and shall not be responsible for any
loss, cost, or damage, including consequential damage, caused by reliance on these materials.
Metaswitch and the Metaswitch logo are trademarks of Metaswitch Networks. Other brands and
products referenced herein are the trademarks or registered trademarks of their respective holders.
CONFIDENTIAL Azure Deployment Design Guide (V8.5)
Contents
1 Introduction.............................................................................................................4
1.1 About this document............................................................................................................. 4
1.2 Relevant product versions.....................................................................................................4
2 Background information for deploying in Azure................................................ 5
2.1 Azure deployment models.....................................................................................................5
2.1.1 Public cloud............................................................................................................. 5
2.1.2 Hybrid cloud.............................................................................................................5
2.2 Availability (Azure public cloud)............................................................................................ 6
2.3 Security..................................................................................................................................7
2.4 Performance.......................................................................................................................... 7
2.5 Regulatory requirements....................................................................................................... 8
3 Planning your Azure deployment.........................................................................9
3.1 Product deployment footprint................................................................................................ 9
3.2 Storage requirements.......................................................................................................... 11
3.2.1 Overview of Azure data storage............................................................................11
3.2.2 Per-product data storage requirements.................................................................12
3.3 System availability............................................................................................................... 12
3.4 Backup and recovery.......................................................................................................... 13
3.5 User access control.............................................................................................................14
3.6 Networking in Azure............................................................................................................ 14
3.6.1 Connectivity outside Azure.................................................................................... 14
3.6.2 Networking within Azure........................................................................................ 15
3.6.3 Networking requirements for VMs......................................................................... 17
3.6.4 Routing traffic to instances....................................................................................17
3.6.5 DNS in Azure........................................................................................................ 18
3.6.6 SSH access in Azure............................................................................................ 18
3.6.7 VXLANs in Azure...................................................................................................18
4 Management, automation and monitoring in Azure......................................... 23
4.1 Overview of deployment and lifecycle management...........................................................23
4.2 Monitoring Azure deployments............................................................................................23
Azure Deployment Design Guide (V8.5) CONFIDENTIAL
1 Introduction
This document does not discuss pricing or cost implications of the Azure services used by Metaswitch
products.
Product Version
Perimeta V4.9.20+
4 1 Introduction
CONFIDENTIAL Azure Deployment Design Guide (V8.5)
This document assumes you are familiar with the concept of Azure tenants, subscriptions and
resource groups.
It is important that you understand the availability implications of each choice because different Azure
resources come with different service-level agreements as outlined in the Azure documentation linked
below.
When selecting the Azure region to deploy to you should consider the following:
• Capacity - Each region has a maximum capacity, which affects the types of services you can
deploy under different circumstances. Some regions allow you to reserve capacity.
• Availability zone support - Not all regions support availability zones; see Azure regions with
availability zones - Azure documentation | Microsoft Docs
• VM SKUs (if deploying on virtual machines) - The availability of VM sizes may vary by region and
availability zone; see Products available by region | Microsoft Azure
• Fault Domain count (if your deployment does not use availability zone redundancy) - See https://
github.com/MicrosoftDocs/azure-docs/blob/main/includes/managed-disks-common-fault-domain-
region-list.md.
Azure Stack
Azure Stack (Azure Stack | Microsoft Azure) allows you to deploy Azure workloads in private clouds or
edge locations as well as the Azure public cloud in a consistent way. This gives greater control over
where data is processed and transmitted and can be used to optimize data workflows.
Azure Stack is a good option if you have existing cloud compute resources you want to use alongside
Azure public cloud resources. It provides one consistent way to deploy and manage workloads on
your hardware and in Azure public cloud.
Azure Stack Edge (Azure Stack Edge | Microsoft Azure) provides Azure managed hardware that can
be deployed where you choose and integrated seamlessly with Azure public cloud.
Azure Stack Edge is the best choice if you require private compute resource, possibly in a specific
location, but do not want to manage that hardware yourself. The hardware is provided as-a-service:
workloads are managed in the same way as Azure public cloud and you control where your workloads
are deployed.
Carrier grade availability inherently requires that the application is deployed across multiple regions,
which in turn requires careful thought about how the application's data will be managed across
those regions. If you need to protect against failures of Azure regions, you can use the geographic
redundancy mechanisms built into Metaswitch products (where supported). For more information, see
Geographic redundancy across Azure regions on page 7.
When distributing nodes between Availability Zones or Regions the deployment must be correctly
configured for automatic failover. For more information on designing for resilience in Azure see
Principles of the reliability pillar - Azure Architecture Center | Microsoft Docs.
In non-production environments where preserving service after the infrastructure or software fails is
not required, a non-highly available solution can be deployed, possibly to a single region or availability
zone.
Note that geographic redundancy is not a substitute for redundancy within each region.
Geographically redundant deployments are designed to counteract very rare and serious region
failures (for example, natural disasters or multiple coincident system failures within a region).
Metaswitch considers that in most cases, the GR recovery mechanisms are too disruptive to be used
to protect against individual failures within a region, which are typically more frequent.
2.3 Security
Azure Security Center (Azure Security Center | Microsoft Azure) provides a suite of tools to help you
secure your Azure deployment. Azure Security Center is used with public cloud and hybrid cloud
deployments to improve the security of your deployment and monitor enterprise compliance. Azure
security features are not enabled on Metaswitch products by default. You must determine which Azure
features you need and enable them.
General information on Azure network security is available at Azure network security | Microsoft
Azure. You should ensure network security is considered as part of your deployment design process.
We recommend that you use Azure Active Directory for authentication and authorization where
possible. For more information, see User access control on page 14.
Azure Key Vault (Azure Key Vault documentation | Microsoft Docs) can be used to store all secrets
relating to the deployment. This may include any SSH keys.
You should avoid exposing IP addresses and ports to the public internet if those endpoints do
not require internet connectivity to provide service. You can find which ports and IP addresses to
protect in the firewall documentation for your Metaswitch product(s). Use Azure security features (for
example, Azure Bastion or Azure Firewall) to protect them. These features are discussed further in
SSH access in Azure on page 18.
2.4 Performance
VM performance in Azure depends on the specification level chosen for the VM and other
components.
Note that some resources or services are subject to limits. This is discussed in Azure subscription
limits and quotas - Azure Resource Manager | Microsoft Docs.
The table below gives an overview of how Metaswitch products are deployed in Azure. For further
details, see the per-product subsections.
Note:
Azure VM specs, also known as VM SKUs or VM sizes, vary over time as new ones are released
and old ones are retired. If you want to use a VM SKU different from the ones listed above,
consider:
• Whether the VM SKU is available in your target Azure public cloud region; see Products
available by region | Microsoft Azure.
• The CPU and memory requirements for the product. You should aim for the same values as the
recommended options above.
• The maximum number of NICs and expected network bandwidth.
The number of NICs required will depend on your IP network design and how you want to
separate traffic. Some products have a fixed number of NICs required; others allow you to use
separate NICs or share traffic between NICs for certain types of traffic.
• Whether the VM SKU supports Encryption at Host, which is recommended for all VMs
(Supported VM sizes - Virtual Machines | Microsoft Docs).
MDM
MDM is deployed as an Azure Virtual Machine.
MDM must be deployed using SIMPL VM. SIMPL VM will manage MDM's storage requirements.
DCM
The Metaswitch Distributed Capacity Manager (DCM) is deployed as an Azure Virtual Machine.
Production deployments require at least two DCM VMs in each site. Lab deployments can have a
single DCM VM.
Perimeta
Perimeta is deployed as an Azure Virtual Machine.
SDE
SDE is deployed as an Azure Virtual Machine using SIMPL VM.
SIMPL VM
SIMPL VM is deployed as an Azure Virtual Machine. It is typically the first application deployed
in a new deployment (because it is used to deploy and manage other Metaswitch VMs). If your
deployment has products deployed and managed by SIMPL VM in multiple Azure regions, we
recommend deploying one SIMPL VM in each region.
• Overview of Azure data storage on page 11 contains an overview of the Azure storage
technologies used by Metaswitch products.
• Per-product data storage requirements on page 12 outlines the requirements of each
Metaswitch product.
SIMPL VM will automatically configure storage requirements for products based on the SDF provided
- see Creating and editing an SDF in the SIMPL VM Product Deployment and Management Guide for
information on how to configure an SDF.
Managed disks
Azure Managed disks (Azure Disk Storage overview - Azure Virtual Machines | Microsoft Docs) are
block-level storage, managed by Azure, and used with Azure Virtual Machines. Managed disks are
locally redundant by default.
When provisioning managed disks there are two high level choices to be made: size and SKU. Each
product has its own requirements highlighted below with further details in the product documentation.
Managed disks offer 3 levels of encryption discussed in Overview of managed disk encryption options
- Azure Virtual Machines | Microsoft Docs. These can be applied to ensure your data is secure and
that your deployment is aligned with any policy or legal requirements.
Azure Blob Storage (ABS) (About Blob (object) storage - Azure Storage | Microsoft Docs) is a cloud
storage solution optimized for storing large amounts of unstructured data. ABS provides various levels
of redundancy, from local to Availability Zone redundancy, as well as three levels of encryption.
DCM
The Metaswitch Distributed Capacity Manager (DCM) requires a managed disk of 16 GiB. We
recommend a standard SSD.
MDM
To accommodate this, SIMPL VM creates a managed disk for each MDM VM based on the
deployment-size set in the SDF.
Perimeta
Size should be chosen as described in Perimeta data storage. Most use cases should use 64GiB
disks which corresponds to size '6' in SKUs.
Choice of SKU depends to a large extent on availability. We recommend Premium SSD Managed
Disks.
SDE
SIMPL VM
Each SIMPL VM requires a 30GiB managed disk for persistent storage and an external managed data
disk of 128GiB. We recommend using Premium SSD Managed Disks with locally-redundant storage
(LRS).
MDM
MDM is always deployed as a pool of three instances per site. This provides tolerance to a single
instance failure in each site.
MDM can also be deployed across multiple sites, with a cluster of three MDMs in each site. This
provides geo-redundancy via per-VM data, topology and DNS information replication across each site
in the deployment.
Perimeta
If you are deploying Perimeta as a standalone system, any failures of either the Azure infrastructure
or the software itself (that is, anything that would normally cause a software protection switch (SPS)),
calls on the instance will be dropped. Instead, networks should be designed for service availability.
If deploying Perimeta as standalone, it is still possible to provide highly available solutions on Azure.
By distributing instances across Availability zones and regions it is possible to make failures of the
software independent. Combining this with active monitoring (for example, using OPTIONS polls)
allows users to immediately redial in the event of a call failure.
Diagnostics
SIMPL VM
SIMPL VM is only used when deploying or updating other Metaswitch products and is not essential
to provide service. Each Azure region your deployment uses should have a SIMPL VM. These are all
managed independently.
Guidance on Azure backup and disaster recovery plans can be found in Backup and disaster recovery
for Azure applications - Azure Architecture Center | Microsoft Docs.
Most Metaswitch products implement an application-level backup and restore mechanism. This
provides a disaster recovery process for Metaswitch products.
MDM requires on-instance local users. MDM does not support Azure Active Directory or third-party
Radius servers.
• Using a RADIUS server. This can be Azure Active Directory, a third-party RADIUS server in Azure
or an on-premises RADIUS server.
• Using on-instance Perimeta user management.
The Metaswitch Distributed Capacity Manager (DCM) requires using on-instance local users. It does
not support Azure Active Directory or RADIUS.
1. Express Route
• Express Route is a dedicated network connection between your data center and Azure. It
requires you or a partner to have a presence in an exchange location where Microsoft is also
present.
• This is the preferred option and effectively extends your internal network into Azure. See
ExpressRoute documentation | Microsoft Docs for details.
2. Microsoft Azure Peering Service (MAPS)
• As with Express Route this requires a physical connection, from you or a partner, to an
exchange where Microsoft is present. For MAPS, your network is peered with the Azure
one and you connect to Azure resources through public IP addresses, but with an SLA and
guaranteed QoS.
• The flavor of MAPs relevant to Metaswitch customers is Azure Internet peering for
Communication services.
3. Public internet
You must decide which option is most appropriate for your deployment.
It is possible to peer VNets to allow traffic to freely flow from one to another. This provides flexibility in
the networks that can be used but requires firewall and Network Security Group rules to be carefully
considered and updated to ensure that traffic remains confined to an appropriate subnet.
You will need to implement an IP network design for the selected products in your deployment. This
includes deciding how many VNets you are going to use, and which subnets you require on those
VNets. These are some items you will need to consider:
• The networking requirements of the products you are deploying, and how you want to separate
traffic. Some products have a fixed number of NICs required, while others allow you to use
separate NICs or share traffic between NICs for certain types of traffic. If using separate NICs, you
will need to decide whether you are happy with these traffic types being on the same subnet, or
whether you also want subnet separation. It is typical to want all management traffic separated on
a separate subnet.
• The VM specs of the products you are deploying, and the maximum number of NICs and expected
network bandwidth of those NICs.
• How you want to manage network security with the use of Network Security Groups. Each product
manual set has a firewall rule table with its requirements. You need to add Network Security
Groups to your subnets and add rules to these to satisfy the connectivity and security requirements
of your deployment.
• Whether any of your products will be exposed to the internet with a public IP. We do not
recommend using public IP addresses for management access.
• Whether any of your products requires VXLANs.
• What VNet peering you need to other sites.
Attention:
By default, Azure routes network traffic within the same VNet between all subnets. This behavior
must be overridden using Network Security Groups. This maintains the segregation of traffic
between management, external and internal networks.
As an example, a typical single-site Perimeta deployment would use a single VNet with 3 subnets:
• Core signaling and media - for SIP signaling and media traffic between the internal interfaces of
Perimeta and other Metaswitch products, open only to those IP addresses and ports for voice
traffic.
DCM
The Metaswitch Distributed Capacity Manager (DCM) requires a network interface on the
management network. DCM must be able to connect to all the products that it licenses over this
management network.
MDM
MDM requires a single management network interface. This should be on the same management
subnet as the products MDM is managing. However, you can allow network interfaces on the same
subnet to communicate with each other on all ports.
If you are deploying a multi-site MDM across multiple regions, you must add VNet peering between
your VNets. This allows MDMs in different regions to communicate with each other over the
management subnets.
When you set up a firewall for MDM you must ensure you configure rules following the MDM
documentation. For full details of the firewall configuration that MDM requires, see MDM firewall
configuration in the Metaswitch Deployment Manager Overview Guide.
Perimeta
Perimeta on Azure requires a minimum of three network interfaces, and as such three subnets. These
are:
Perimeta supports a management interface plus up to eight interfaces for carrying service traffic.
For example, in deployments supporting Microsoft Teams Direct Routing, Metaswitch expects the
following network interfaces:
• One interface towards the MicrosoftTeams PSTN hub (signaling and media)
• (Optional) One interface towards the public internet for media bypass (media only)
• One interface towards your carrier datacenters (signaling and media)
• One management interface
When you set up a firewall or Network Security Group for Perimeta you must ensure you configure
rules following the Perimeta documentation. For full details of the traffic flows that Perimeta requires,
see Traffic information for firewall configuration in Perimeta Network Integration Guide and Firewall
and security group configuration in the Perimeta Initial Setup and Commissioning Guide for your
deployment.
SDE
SIMPL VMs
You do not need to plan redundant network interfaces because Metaswitch products rely on Azure's
redundancy features to protect against networking failure. For example, you do not need to select a
Perimeta port group scheme that supports redundant ports. For more information about the Azure
redundancy mechanisms that Metaswitch recommends, see Availability (Azure public cloud) on page
6.
Public addresses
Public addresses are a chargeable resource in Azure that can be assigned to VMs. If a public IP
address is not assigned to a VM, outbound connectivity is still possible and Azure dynamically assigns
an available IP address that is not dedicated to a resource. Instead of using public IP addresses for
management access, we recommend that you set up an Azure VPN Gateway; see VPN Gateway |
Microsoft Docs.
Public IPs have 2 SKUs: Standard and Basic. See Public IP addresses in Azure | Microsoft Docs.
Because only Standard public IP addresses support availability zones, you should use this SKU for
any public IP address associated with a Metaswitch product.
Load balancing
Azure does not provide a layer 7 SIP load balancer for SIP traffic. Any load balancing in Azure is done
via DNS records (for example, DNS SRV weightings) or Azure Traffic Manager.
Azure Traffic Manager (Traffic Manager - Cloud Based DNS Load Balancing | Microsoft Azure) is a
DNS based traffic load balancer. It allows distributing traffic to public facing applications across Azure
regions and is the way Metaswitch expects most instances of SBC applications to be addressed using
DNS. Traffic Manager is not SIP-aware; it therefore cannot ensure that all messages on a SIP dialog
are sent to the same endpoint.
Traffic Manager IP addresses that are the target of FQDNs can be updated immediately in the case
of an instance failure. Traffic Manager can also resolve FQDNs based on geographic location. This
means that a deployment that spans the world needs to expose only a single FQDN that maps to
different IP addresses depending on the location of the querying entity. This simplifies routing for
applications like Microsoft Teams Direct Routing hosted in an SBC.
• Azure VPN Gateway (VPN Gateway - virtual networks | Microsoft Azure) provides a standard VPN
for connecting to your Azure VNets. Azure VPN Gateway is Metaswitch's recommended option.
• Azure Bastion (Azure Bastion | Microsoft Azure) provides a mechanism to connect to VMs over
SSH through the Azure Portal and so avoids exposing the workload's SSH port on the public
internet.
Virtual extensible Local Area Network (VXLAN) tunnels allow Ethernet (layer 2) traffic to be
transferred over an IP (layer 3) network. Certain Metaswitch products support the use of VXLANs,
allowing them to be deployed as a high availability system in Azure, and granting access to
networking features not normally supported in Azure.
VXLANs are an encapsulation of an ethernet frame as documented by RFC 7348. They can be used
to provide a complete layer 2 network (known as the overlay network) over the top of an existing layer
3 network (the underlay network).
In normal VXLAN operation, the underlay network is invisible to the servers. VXLAN Tunnel End
Points (VTEPs) encapsulate the layer 2 overlay packet before forwarding it across the layer 3
underlay network.
Each VXLAN overlay network is uniquely identified with a VXLAN Network Identifier (VNI).
Perimeta and the Secure Distribution Engine both support the use of VXLANs.
VXLANs in Perimeta
For two Perimeta VM instances to function as a high availability pair, they must have floating IP
addresses for signaling and media that are claimed by the current primary instance. The primary
instance claims these addresses by sending gratuitous ARPs (GARPs) to peers.
However, Azure does not support floating virtual IP addresses, and Azure vNets do not support
sending GARPs. In order to function as a high availability system in Azure, Perimeta must
encapsulate these GARPs in VXLAN tunnels.
Service traffic is also sent through VXLAN tunnels when deploying Perimeta as a high availability
system in Azure. As such, network elements connected to Perimeta must be capable of handling
encapsulated VXLAN traffic.
• VXLAN tunnels.
• Service interfaces for the overlay network.
• Service interfaces for the underlay network.
Overlay service interfaces represent Perimeta's connection to the overlay network. In addition to other
service interface configuration, they are configured with a VXLAN tunnel. When packets are sent out
from an overlay service interface, they are encapsulated and passed to the underlay service interface,
based on the address configured on the overlay service interface's VXLAN tunnel.
Underlay service interfaces represent Perimeta's connection to the underlay network. In addition to
other service interface configuration, they must be configured with the named per-instance address of
a VXLAN tunnel.
When used to create a high availability system in Perimeta, underlay service interfaces are configured
with the pair of per-instance IP addresses of both instances of the high availability pair.
In the following example, Perimeta functions as a VTEP, decapsulating a packet received from a
VTEP peer on the core underlay network, and then re-encapsulating it before forwarding it to a VTEP
peer on the access underlay network.
1. A packet arrives at VTEP Peer 1. VTEP Peer 1 encapsulates the packet with a VXLAN header.
2. VTEP Peer 1 sends the encapsulated packet to Perimeta's core underlay interface. This is the
VXLAN tunnel.
3. The packet is decapsulated and passed to the core overlay service interface. It is then forwarded
to the access overlay service interface, and then to the access underlay service interface.
4. The packet is encapsulated with a VXLAN header and sent from the access underlay service
interface to the VTEP Peer 2 on the access underlay network.
5. VTEP Peer 2 decapsulates the packet and forwards it to the next network element.
Restrictions
VXLANs must only be used to create high availability systems in Perimeta. Perimeta does not support
all VXLAN features, and must not be used as a generic VTEP. Perimeta does not support:
• Using VXLAN tunnels outside of poll mode. As VXLANs can only be used in poll mode, all tunnel
configuration must be manually removed before switching a system to interrupt mode.
When deployed in Azure, each SDE VM uses VXLANs to route traffic: external VXLANs and an
internal VXLAN. SDE functions as a VTEP, decapsulating, directing and recapsulating traffic received
from VTEP peers (such as VTEP-capable routers on the external network, or Perimeta Session
Controllers on the internal network).
• The details of peer VTEPs (such as underlay addresses of Perimeta Session Controllers) on
SDE's internal network.
Overlay addresses
Overlay addresses represent SDE's connection to the overlay network. When packets are sent out
from an overlay address, they are encapsulated and passed on based on the address configured on
an overlay network VXLAN tunnel.
Underlay addresses
Underlay service interfaces represent SDE's connection to the underlay network. Underlay addresses
correspond to the per-instance addresses of the SDE peers in a high availability pair.
In Azure, Flow Steerer functions as a VTEP, decapsulating, directing and recapsulating traffic
received from VTEP peers (such as VTEP-capable routers on the external network, and Perimeta
Session Controllers on the internal network).
When deployed in Azure, Flow Steerer requires the following VXLANs to route traffic:
• one external VXLAN between Flow Steerer and the router for each VLAN,
• one internal VXLAN, between Flow Steerer and your MSCs.
• In the Solution Definition File (SDF) for your Flow Steerer deployment:
• Overlay and underlay addresses for the MSCs Flow Steerer will connect to.
• In the flowsteerer_vlans.yaml document:
Note that some Azure services require an agent to be installed. Not all agents are supported by all
Metaswitch products. Details of supported agents are available in the documentation for individual
products.
SIMPL VM
Metaswitch's SIMPL VM is used to manage other Metaswitch VMs on Azure public cloud. SIMPL
provides a unified and consistent way to deploy, commission, and update Metaswitch products.
• Using the Azure CLI. This is the most manual approach as all Azure resources must be created
and linked together by the user. This provides flexibility to the user but at the cost of convenience.
• Using ARM templates. ARM templates provide a parameterized way to deploy components and
their dependencies. A user fills out a parameters file and provides this file and a template file to
Azure to create the product and its dependencies. Using ARM templates is a declarative process.
Both mechanisms rely on the user having access to the appropriate Azure VM image.
If new software can be uploaded to Perimeta instances (for example, if SFTP connectivity is possible),
upgrading and applying efixes can be done with Perimeta's standard procedures for upgrading and
applying efixes. If new software cannot be uploaded directly, Perimeta can download new software
versions and efixes from an Azure storage account. This uses Perimeta's orchestration API to
download the software to the instance and apply it.
Logs
Azure Monitor is used to collect and view logs generated by Metaswitch products. This provides a
single place to view and analyze logs from your deployment. All Metaswitch products supported in
Azure integrate with Azure Monitor except Metaswitch Distributed Capacity Manager (DCM).
Logging and auditing in the Azure platform is covered in Azure security logging and auditing |
Microsoft Docs. Azure provides mechanisms for security logging across Azure resources and allows
you to audit and analyze those logs using a range of tools.
Azure has an Activity Log for all resources to record health events and updates. This integrates with
Azure Resource Health monitoring. As well as user initiated changes these logs also display issues
affecting your resources caused by the Azure platform.