How do we minimize lag in aws-secretsmanager-caching-python when secrets get rotated?

Question:

We are using AWS Secrets Manager to store public/private keys to encrypt decrypt messages between services and want to rotate secrets.

aws-secretsmanager-caching-python looks perfect for caching our secrets, but it has a refresh interval with a default of one hour.

What happens for the 1-60 minutes that an old secret is cached and will no longer decrypt messages? We can detect the secret no longer works. Once we detect this, is there a way for us to force the value to refresh? What is the intended way to handle this?

Asked By: Jay Askren

||

Answers:

Rotating secrets without application down time requires being able to have at least two usable secrets in flight at the same time (the current and next or current and previous depending on the point in time). If you are using this for encryption, then the message must contain an unencrypted pointer to the version of the secret to use. Then the receiver can discover which encryption key to use for the message.

A better alternative for encryption, though, would be to use KMS (which is what Secrets Manager uses). In this scheme, you call KMS GenerateDataKey to get both a plaintext and encrypted version of an encryption key. You encrypt the payload using the plaintext key, and send the encrypted key and encrypted payload in the message. The reciever then calls KMS to decrypt the encrypted data key and then uses that to decrypt the payload. This way, you do not have to manage key versions. You can also use either symmetric or asymmetric keys with KMS.

Answered By: JoeB

Despite being the aws recommended solution for caching secrets from secrets manager and the docs suggesting it supports secret rotation, the aws-secretsmanager-caching-python library doesn’t appear to support eviction which would be needed for key rotation. This unit test suggests they are testing refreshing the secret:

 def test_get_secret_string_refresh(self):
    secret = 'mysecret'
    response = {}
    versions = {
        '01234567890123456789012345678901': ['AWSCURRENT']
    }
    version_response = {'SecretString': secret}
    cache = SecretCache(
        config=SecretCacheConfig(secret_refresh_interval=1),
        client=self.get_client(response,
                               versions,
                               version_response))
    for _ in range(10):
        self.assertEqual(secret, cache.get_secret_string('test'))

But, the code sets an initial secret, creates a new config with a small refresh interval which is then not even used, and tests 10 times that the secret was set to what it was initially. It isn’t testing refreshing at all and looks like the code is still half baked.

Answered By: Jay Askren