Terraform, Destroy Provisioners and Pre-Staging Octopus Deploy Tentacle Certificates

At work we've spent the last 12 months or more rebuilding our platform in Terraform. We previously had automation around deploying the VMs themselves (and Puppet would take over the configuration when the machines came up), but other stuff such as setting up Virtual Networks, VPN gateways and routing, Domain Controllers and whatnot were always partially or totally manual.

With Terraform, we now have the ability to create and destroy an entire environment (which can comprise as much or as little of our platform as we choose) and have the whole process automated end-to-end. Getting there has been hard, and we still have few loose ends here and there that need finishing.

Anyway, this post is about one of the issues that we came across when repeatedly creating/destroying an environment. One of the really useful features of Terraform is the destroy provisioner which is only invoked when you're actually deleting infrastructure. It comes into its own for housekeeping, and also in preparation for the same infrastructure to be recreated without causing problems. A Puppet Master, for example, won't sign a certificate request for a node that it thinks it's already managing, even if the VM has since been blown away and rebuilt. The certificate needs to first be removed so that a node with the same name can submit a new certificate signing request. Solution - destroy provisioner.

It's the same with Octopus Deploy. A destroy provisioner can be used to remove a machine from a given environment on the OD server as part of the destroy process. When the machine gets recreated, the machine comes up, joins the Puppet Master, and then starts installing stuff like the OD Tentacle which rejoins it to the OD server. Except, under certain circumstances, it didn't.

These "certain circumstances" were when we wanted to perform the initial Puppet Agent run as part of the Terraform provisioning process. This affords us two things: the log output of all the Puppet Agent runs across all nodes are recorded in the console output, and a failed agent run will also bubble the error up to Terraform itself, causing Terraform to mark the apply as failed. If something was broken in Puppet, Terraform could tell us as soon as the environment had been deployed.

What I found was that the OD Tentacle installation process was unable to generate itself a new certificate when the Agent was invoked by Terraform over WinRM. This issue is not specific to Terraform or WinRM, or OD, and it's fairly well documented here - the reason in a nutshell is that the Windows user profile isn't loaded in these circumstances and, without the profile loaded, certain standard Cryptographic functions will not work.

There were a couple of solutions suggested in that Github issue, the most reliable of which was to create a scheduled task on the box which would generate the certificate (the schedule task would, of course, load the user profile if it was told to). These solutions felt a bit grubby and brittle. We need something more robust...

The only step that was having problems was the certificate generation. As well as generating a new certificate, the OD tentacle is also able to import an existing one. So, if we could somehow generate one up-front that could be imported, that would solve it. After looking at a few ways of doing this, I finally tried what turned out to be a much more elegant solution, especially given that we're running Terraform exclusively on Linux or Mac OS.

The OD Tentacle just needs a self-signed certificate in PFX format, including the private key with no password, Base64 encoded. Using the same private key file for all certificates (saved in the current directory as octopus.key), and with a variable $fqdn set to the name of the node that the certificate is intended for, this pipeline will produce a certificate that can be imported by the OD tentacle:

openssl req -x509 -new -key octopus.key -days 3650
-subj "/C=GB/ST=YourState/L=YourCity/O=Your Company/OU=Department/CN=${fqdn}" | openssl pkcs12 -export -inkey octopus.key -passout pass: |
openssl base64

Gist here

This script is invoked by Terraform's External Provider to get the result of the above pipeline, and use it in a file provisioner to copy the certificate to the node being provisioned. After Puppet takes over, the OD Tentacle installer first looks for a certificate file in a known location and imports it if it finds one. If not, it falls back to the old method of generating its own certificate.

Simple and effective. I just wish I had looked at this solution first rather than trying to get around the user profile issue!