ARM vs Terraform - Oleg Ignat

Infrastructure as code is a powerful concept. More and more companies are transitioning from imperative scripts to declarative models. If you are an IT guy or avid DevOps in one of those companies you might be wondering which technology to use. The answer is not simple and in most cases, unfortunately, boils down to “it depends”, which is utterly useless and unactionable. In this post, we will compare Microsoft Azure Resource Manager to Hashicorp Terraform across every relevant dimension to help you make a decision.

Imperative vs Declarative? Fancy.

Lets address the elephant in the room and cover the basics – what’s the difference between an imperative and declarative infrastructure automation. It is actually very simple:

The imperative model prescribes the sequence of steps that, hopefully, lead to the desired state of the system. For example – create a folder, copy a file, rename it, invoke a command passing that file as a 3rd argument.
The declarative model defines the desired state of the system only and leaves it up to the automation to figure out the sequence of steps to achieve it. For example – IIS is up and running on a VM. Automation will construct a script to enable IIS depending on the version of Windows.

The benefits of declarative model are pretty obvious, however we will list them anyway for completeness.

	Imperative	Declarative
Easy to write and understand
Can adapt to the initial state of the environment without changing a single line of code
Rules and policies can be applied by a computer
Code to achieve goal state can be written and tested independently of the goal state itself
The sequence of steps that will be applied to achieve the desired state can have rules and policies applied by a computer
Tolerance to failures and interruption during execution
Engineering effort to develop and maintain if no suitable technology exists on the market
Ability to freely evolve automation without dependence on another team or company

Both ARM and Terraform are declarative infrastructure automation tools.

Layer of technology stack

Most of us know exactly how ARM and Terraform fit into the universe of automation tools. However, for those few of us new to this space it is worth drawing a bigger picture to orient people. I realize that with enough will and time anything can be done with any technology. In the table below we will cover the mainstream use of technology only.

Coordinate execution of automation across a fleet	Jenkins	Netflix Spinnaker	Microsoft Azure DevOps	AWS CodePipeline
Install the software on VMs and create VM images	RedHat Ansible	Chef	HashiCorp Packer	Puppet
Create and configure low-level infrastructure (compute, network, storage, etc.)	Microsoft ARM	HashiCorp Terraform	AWS Cloudformation	Google Deployment Manager

Cloud and infrastructure compatibility

Lets start breaking down the differences between ARM and Terraform from one of the most relevant areas – how many infrastructure vendors can each technology touch?

Microsoft Azure Resource Manager can only deploy to Microsoft Azure. It is not designed to work with any other cloud provider. There is a new product Azure Arc that extends the Azure control plane into the on-premise environment, but at the time of writing of this article, it is in infancy and doesn’t have a broad functional surface area. This of ARM is Azure-specific technology. It is a closed-source product of Microsoft with only Microsoft FTEs contributing to it.

ARM is an integral part of the Azure Control Plane and can’t be separated from it. ARM as a service is exposed to the world at https://management.azure.com/. It handles not only resource provisioning but also authorization, ARM provider registration, etc. This service participates in the billing system by making sure all ARM providers have the right subscription metadata, namely offer identifier.

It is not exactly an apples-to-apples comparison between ARM and Terraform because the latter itself relies on ARM to configure Azure resources. However, it is a fair game to compare ARM templates with Terraform. ARM template engine runs on top of ARM and provides declarative resource configuration capabilities.

ARM Templates run on top of ARM and can be used by Terraform

Azure CLI is a thin command-line tool auto-generated from the specification of each Microsoft internal ARM provider. It can make simple pass-through calls into ARM but can’t perform complex client-side computations. It can be installed on most operating systems – Windows, macOS, Linux, etc.

Terraform, on the other hand, can deploy to most cloud providers, on-premise data centers via OpenStack, and even configure non-infrastructure resources (for example, Spinnaker pipelines). It is a mature and standalone product with a vibrant community of developers and consumers.

Terraform is a client-side command-line tool that will run on most operating systems – Windows, Linux, macOS, even Solaris.

Azure Resource Manager	Terraform

Language

ARM templates are plain vanilla JSON documents with a well-defined schema. Tools that understand JSON schema can assist in authoring (for example, Visual Studio). ARM templates support built-in functions for string and number manipulation, explicit dependency expression between objects, output section to be emitted after template execution, etc.

ARM templates can be self-sufficient and define every property of every resource. They can also contain references to variables supplied in a separate JSON document with a well-defined schema.

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "location": {
            "type": "string",
            "defaultValue": "[resourceGroup().location]",
            "metadata": {
                "description": "Location for all resources."
            }
        }
    },
    "resources": [
        {
            "type": "Microsoft.Network/virtualNetworks",
            "apiVersion": "2019-04-01",
            "name": "amazing-vnet",
            "location": "[parameters('location')]",
            "properties": {
                "addressSpace": {
                    "addressPrefixes": [
                        "10.0.0.0/16"
                    ]
                },
                "subnets": [
                    {
                        "name": "greatSubnet",
                        "properties": {
                            "addressPrefix": "10.0.0.0/24"
                        }
                    }
                ]
            }
        }
    ]
}

The content of every “resource” object is dictated by the “type” of that resource. There are 5 required ARM properties on every resource: type, apiVersion, name, location, and properties. What goes into “properties” is driven by a corresponding ARM resource provider rather than ARM itself. ARM figures out which resource provider to route the request to based on “type” property value. For example, “Microsoft.Network” resource provider has a registered end-point with ARM in the “location” region. When ARM performs a template deployment it sends a PUT request with the “properties” payload into that HTTPS end-point.

If we are honest with ourselves, there is a lot to unpack in the ARM template language. For a newbie, it is probably not obvious what goes where and how it all comes together. JSON document becomes large because it needs to follow the JSON convention. There is no documentation for all resource properties in one place and to make matters more entertaining, properties depend on “apiVersion“. Of course, it is all very flexible but arguably difficult to author and maintain.

Terraform uses HCL to declare resources. It is a very basic JSON-alike language. Language doesn’t have much syntactic overhead and pretty much is driven by the properties of each resource being defined. It supports both explicit and implicit dependency declaration – meaning, you can just reference a property of another resource, and Terraform will know that that resource needs to be created before this one. HCL language also supports a set of functions to transform string and integer data. However, since Terraform has a luxury of running on the client-side, it can touch files on the file system. It has also been around the block for a while to have functions like base64encode that can receive input from a local file.

One of the nice things for Terraform is structured documentation for all resources in one place. To express the same virtual network resource as in ARM template one would have to write the following TF file:

resource "azurerm_virtual_network" "example" {
  name                = "amazing-vnet"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  address_space       = ["10.0.0.0/16"]

  subnet {
    name           = "greatSubnet"
    address_prefix = "10.0.0.0/24"
  }
}

The downside of Terraform is that it is always lagging behind ARM resource feature-wise. When Azure releases a brand new awesome resource or a new property on an existing resource, the ARM template picks it up automatically without any code changes. This is not the case with Terraform – a new drop of Terraform is required to recognize the new property and map it correctly to ARM resource definition on the wire. It usually takes 3-6 months in the current HashiCorp development cycle. After that, the Terraform client needs to be updated in the automation environment. We are probably looking at another 1-2 weeks for very eager individuals.

Both ARM templates and Terraform support modularization – meaning you can reference other ARM templates from within a template, and you can include other Terraform modules from within a Terraform script.

Having considered all points, I have to award this one to Terraform.

Azure Resource Manager	Terraform

State maintenance

Given that both ARM and Terraform operate using declarative model, they both need state.

ARM maintains an internal state in Azure storage account not accessible by any customer. Locking and concurrency control is built-in and is predictable when multiple users are performing simultaneous deployments against the same set of resources.

State in ARM is considered authoritative for the purpose of resource discovery. It means that if you create a resource, say a VM, and then ARM state somehow loses that resource, you will continue to be billed for a VM but you won’t see it in your subscription. ARM state can be synchronized with a subscription on-demand when there is drift.

The concept of import or export of ARM state is a misnomer. Resource deployment, deletion, or modification is the way to alter ARM state. It is not possible to “snapshot” ARM state at a moment in time and save it locally to compare it at some point in the future. The closest thing to ARM state snapshot is to request ARM to generate a deployment template for your resource group and store that template offline. It is not guaranteed to be identical to the state as certain sensitive properties may not be exportable and certain properties of resources can be read-only.

Terraform state, on the other hand, can be either local or remote. When the state is remote, it is controlled by a backend plugin and can be written into an Azure storage account, for example. In fact, it can be written into the AWS S3 bucket while managing ARM resources.

Terraform supports state locking via backend provider implementation. Locking is atomic and exclusive. It allows multiple users to interact with the same set of resources at the same time without causing state corruption.

A nice feature of Terraform state is a workspace. It allows “forking” of state across multiple users for evaluation purposes before applying the change back to the “main trunk”.

Having considered all inputs, Terraform takes the prize in state maintenance category.

Azure Resource Manager	Terraform

Secrets

This is an important topic to cover as this has a potential to break you business if handled incorrectly.

ARM does not store secrets inside its state. Secrets may be supplied inline inside a template, inside a parameter file, or even referenced by a parameter file to be extracted from the Azure Key Vault. This is a great separation of concerns between

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
      "secretPassword": {
        "reference": {
          "keyVault": {
          "id": "/subscriptions/<id>/resourceGroups/<name>/providers/Microsoft.KeyVault/vaults/<vault>"
          },
          "secretName": "MyVeryOwnSecretPassword"
        }
      }
  }
}

When it comes to Terraform, secrets will be stored inside the state file. At the time of writing of this article, Terraform does not support separation of secrets from the rest of the state, which is a shame. Any security expert will immediately spot separation of concerns between IT guys (infrastructure) and security operators (password owners). Moreover, in DevOps model it is not possible to protect secrets in the state such that developer that runs Terraform gain have access to it. This pretty much makes Terraform unusable by hand for production deployments directly. It must be wrapped with automation that doesn’t expose secrets to humans.

When secrets are stored in the state file, this file itself becomes a secret. It must be encrypted to protect from exposure. Some other piece of infrastructure needs to manage encryption keys and rotate them. The state file as a whole must be decrypted before handing it to Terraform for consumption. It must be encrypted afterward and stored safely.

In secret management category ARM takes the helm.

Azure Resource Manager	Terraform

Identity and access management

ARM has a built-in Role Based Access Control and Activity Log. These two are important attributes of threat and vulnerability management. You can define who has access to which resources and who doesn’t. Who can deploy and who can only read. And should something go south – you will be able to pull deployment logs with all necessary attributes (user identity, calling IP address) from the Audit Log.

Terraform, when deploying to Azure will use ARM under the hood and, therefore, will benefit from the same two capabilities. Terraform will have to authenticate to ARM using Azure CLI first, which will establish identity. After that every PUT call that Terraform makes will be authorized by RBAC of ARM. As such, Terraform will have the same exact capabilities as any other client of ARM.

However, when not talking to ARM resources but rather managing any on-premise resources, none of the RBAC or logging/auditing capabilities exist. Terraform does not maintain a block-chain ledger of operation applied to the state nor validates user permissions to perform those operations on its own. Anyone who has access to the state file can alter it without trace.

In security world, ARM takes the upper hand.

Azure Resource Manager	Terraform

Governance

If you are running a relatively serious shop, you likely have security people who help protect your customer data. Those aren’t developers or IT people – they are specially trained individuals looking for vulnerabilities in your deployments. They regularly conduct penetration testing and ask you to fix certain configurations in your infrastructure to make their jobs harder. Since they are security people, they likely don’t trust your word and want to make sure you don’t shoot yourself in the face while deploying infrastructure components. They want to deny deployments that are known to be vulnerable to attacks or want to fix-up your deployments on the fly as you submit them. This is where they use policies or governance to keep you in check.

ARM has a policy engine. Security team can configure policies on your subscriptions and keep tabs on all developers in the organization. Policies can be straightforward or fairly expressive:

No public IP address allowed
Inventory all Cosmos DB accounts that do not have an IP range assigned to it
Allow only select set of OS versions to create VMs

Here is an example of the community policy that restricts the set of database collations. This policy can be applied at difference scopes – per subscription, per resource group or per management group:

{
  "displayName": "Allowed SQL Database collations",
  "policyType": "Custom",
  "description": "Will deny deploying databases with collations not listed as part of the allowed list",
  "mode": "All",
  "metadata": {
    "category": "SQL"
  },
  "parameters": {
    "listOfAllowedCollations": {
      "type": "Array",
      "metadata": {
        "displayName": "Allowed Collation Names",
        "description": "A list of approved collations, example: Latin1_General_BIN;SQL_Latin1_General_CP1_CI_AS"
      }
    }
  },
  "policyRule": {
    "if": {
      "allOf": [
        {
          "field": "type",
          "equals": "Microsoft.SQL/servers/databases"
        },
        {
          "field": "name",
          "notEquals": "master"
        },
        {
          "not": {
            "field": "Microsoft.SQL/servers/databases/collation",
            "in": "[parameters('listOfAllowedCollations')]"
          }
        }
      ]
    },
    "then": {
      "effect": "Deny"
    }
  }
}

Terraform also has ways to govern state changes. There are two community-vetted approaches to this problem:

Open Policy Agent is a generic tool that can process any JSON document and apply rules to it. OPA has been adopted to recognize Terraform state and evaluate rules against it.
Terraform Sentinel is HashiCorp’s own tool for evaluating Terraform plan and either approving or rejecting it.

Both OPA and Sentinel are standalone tools outside of Terraform. They can be used to augment Terraform behavior within a deployment pipeline but if anyone has access to Terraform scripts, credentials for target infrastructure environment, and a free minute to mess around, these tools will be bypassed.

Having carefully weighted both options, ARM wins this match.

Azure Resource Manager	Terraform

Operational cost

Nothing is free in this world, but some things are cheaper than the other, all else being equal. Lets look at the engineering and purely financial cost of building a “perfect” solution using ARM and Terraform. Perfect solution would be the one that:

Scales with the growth of the company without significant re-engineering
Secure right off the bat
Evolves in lockstep with time and tolerates technology trend changes
Does not require active maintenance by a trained professional

We will pick a winner by comparing how each of these dimensions can be achieved via ARM and Terraform

	Azure Resource Manager	Terraform
Compute cost to execute deployment operation	Only a client machine to kick off a deployment will cost	A client machine and an intermediate VM in the infrastructure is needed to run the client-side Terraform tool
Secure sensitive parameters	AKV needs to be provisioned for secrets	AKV needs to be provisioned for secrets, Azure storage account needs to be provisioned for encrypted state maintenance
Rotate secret material	Free	Manual automation needs to be written to decrypt and re-encrypt state files
Policy evaluation	Free	The same intermediate VM can run policy agent
Hardware-based security	Included with Azure Key Vault	Included with Azure Key Vault
Software upgrade	Free	Automation needs to be developed to re-provision intermediate VM with the new OS and new version of Terraform. Baking infrastructure needs to be in place. VM deployment infrastructure is required.
Reliability	Reliability of (ARM, AKV)	Reliability of (ARM, AKV, Storage, VM, Managed Disk, Terraform backward compatibility, Policy Agent backward compatibility)
Vulnerability vector	ARM or AKV	ARM, AKV, Azure Storage, VM, Managed Disk
Additional human resources	1	2

Conclusion

If you are looking for something cross-platform, cross-cloud infrastructure, something you want to invest in as a technology, something that is easy to learn and get started with – Terraform is your choice. On the other hand, if you need a hands-free, all-inclusive, robust security solution that works only with Microsoft Azure – ARM is a better choice.

In my opinion, it would be a great win for Azure to support Terraform deployments natively. Wouldn’t it be great to take all the upsides of Terraform and combine them with the upsides of ARM templates? Imagine you could submit your compressed TF scripts into ARM for deployments and all the machinery of HashiCorp would work as seamlessly as Azure itself. That would be something.

Credits

Background vector created by freepik – www.freepik.com.