Terraform Native Testing Framework: Are you testing what you think you're testing?

With the release of Terraform version 1.6, the previously experimental testing framework became generally available. If you've already been writing tests for your Terraform code using something like Terratest or tftest, you might want to take a look to see if this new framework might simplify your CI pipelines. That's not to say that the Terraform testing framework will do everything those other tools will - it probably won't - but it could replace a good chunk of core testing functionality with something more comprehensive. Terratest in particular has a pretty high barrier to entry, and despite having written a chunk of Golang in the past I still find it hard to grok the code by just passing my eyes over it.

This post isn't going to go over how the framework actually works; there are a good number of tutorials already out there that will guide you. Of course the official documentation is a good place to start if you're already familiar with Terraform, and these videos from Hashicorp are pretty good, too.

Having implemented some tests in Terraform, I wanted to cover a couple of things that don't seem to be well covered by the documentation - at least not yet anyway.

As with tftest and Terratest, the testing framework draws a rough parallel with the concept of unit testing and integration testing from the world of software development. In Terraform, a unit test is performed on a Terraform plan operation, and an integration test is performed on a Terraform apply operation. That is to say, with a unit test you're going to be running some assertions against what the plan says it's going to do, but you're not going to actually create the infrastructure. This makes unit tests quick to run.

With an integration test, you're going to run some assertions against what Terraform actually created, to ensure that the infrastructure you were expecting to create actually got created correctly. At least, that's what the documentation would have you believe. But let's think about it for a second. What is the integration test actually checking? The only thing that the Terraform binary can assert against is what it gets back from the provider - and that's the resulting state from the apply operation. In other words, we're going to be running assertions against what the provider says it did, not necessarily what actually took place. And, in my opinion, that's a potential problem. If a bug in the provider causes it to create a resource with configuration that doesn't match what was specified in the Terraform code, then running assertions against the state will not surface that problem. The resource that was created might not be in the state you wanted it to be, but your tests will pass because, fundamentally, your tests are not testing the infrastructure itself. This is a major difference between the Terraform native testing framework and something like Terratest. With Terratest, your integration tests run their assertions by querying the state of the concrete resource via its native API, not the Terraform state that the provider reported back.

So what, if anything, can be done about this? You could just continue to use Terratest or whatever but if you, like me, prefer the syntax and readability of the native tests, can we craft something within the Terraform testing framework that will give us what we want? There are actually a couple of options that I can think of - the first of which I use myself. Time for some code!

Let's have a very simple piece of code that creates an Azure storage account:

resource "azurerm_resource_group" "test" {
  name     = "test"
  location = "northeurope"
}

resource "azurerm_storage_account" "test" {
  name                     = "devopsfutesting"
  resource_group_name      = azurerm_resource_group.test.name
  location                 = azurerm_resource_group.test.location
  account_tier             = "Standard"
  account_replication_type = "LRS"
}

output "resource_group_name" {
  value = azurerm_resource_group.test.name
}

output "storage_account_name" {
  value = azurerm_storage_account.test.name
}

And in our tests/ directory, let's write a test that creates an instance of it:

provider "azurerm" {
  features {}
}

run "storage_account" {
  command = apply
}

No assertions? Yes, but let's run them against a data source that queries the state of the resource created by the test run above. We can do this by using a module within our test. Here, create a module called verify inside our tests directory. The module accepts a couple of variables, and all it does is use the provider's data source to query for an account that we specify:

variable "resource_group_name" {
  type = string
}

variable "storage_account_name" {
  type = string
}

data "azurerm_storage_account" "verify" {
  name                = var.storage_account_name
  resource_group_name = var.resource_group_name
}

And finally, we create a second test run which executes against the verify module, passing in the name and resource group of the storage account from our first test run:

run "verify" {
  command = plan

  module {
    source = "./tests/verify"
  }

  variables {
    resource_group_name  = run.storage_account.resource_group_name
    storage_account_name = run.storage_account.storage_account_name
  }

  assert {
    condition     = data.azurerm_storage_account.verify.account_tier == "Standard"
    error_message = "The storage account tier should be set to Standard"
  }

  assert {
    condition     = data.azurerm_storage_account.verify.account_replication_type == "LRS"
    error_message = "The storage account replication type should be set to LRS"
  }
}

This pattern enables us to run tests against the state of the resource retrieved by the data source - which will query the API for the actual resource as created by the first test run block. Of course, this is still not a perfect solution; a bug in the data source code in the provider could also cause issues with retrieving the state correctly, but it's certainly a test method that's closer to checking reality than the default method of checking the Terraform state.

If using a data source is not an option (perhaps because the provider doesn't have one for the specific resource you're testing) then there are other possibilities in the verify module:

  • For Azure, you could use the azapi provider to query for the state of the resource.
  • You could use an external data source to run a script which pokes the API directly.
  • You could use Terracurl to do some of the work for you.

Hopefully this gives you some ideas for ways to make your testing a little more strict than the default of checking the Terraform state!