Living with CloudFormation Drift

Posted on December 07, 2019

For the current app I’m working on, we’re using CloudFormation for infrastructure-as-code solution, and Cognito as the auth provider. This has lead to some interesting discoveries with CloudFormation drift.

For example, let’s say you want to add a pre-signup lambda to your CloudFormation and you’re using the serverless framework.

You can write a serverless function and add your Cognito as the handler. However, CloudFormation will get a little confused by this if you don’t include an existing tag to your function. And sometimes can wipe your Cognito settings.

When it comes to something as opinionated at setup as Cognito, you need to ensure that it is behaving exactly how you expect it. Otherwise, you’re users could be being verified by code instead of link or something. This can cause breakages in your app.

These also are silent failures of your infrastructure where you typically do not discover them until there is a bug.

AWS has a feature called “Drift detection” for CloudFormations. However, there is currently no “fix by drift” solution from AWS.

So, here we are stuck a problem where our codified Cognito settings are correct, but on any deployment the live Cognito settings could be different than what we’d expect. You can extend this problem to all sorts of issues.

For example, you may have backups enabled on your DynamoDB CloudFormation. But, anyone with the correct permissions could have logged into the AWS console and disabled backups. Sure, it’s a stretch. But, it could happen.

My solution to this issue is to put all mission critical settings in a Lambda Function that gets invoked post commit.

For example, we could with 20-30 lines of Golang or Node call the Cognito settings and ensure they are what we expect them too be. We could also call our DynamoDB table and make sure backups are enabled. We then can throw an error, send an email, notify something like PagerDuty, or whatever if the settings are not as expected.

You now can be sure that after every deployment the settings are what they should be.