AWS

[EN]Multi-account environment with CloudFormation StackSets

Sean 션 2024. 8. 16. 10:48

[KO]CloudFormation StackSets를 이용한 다중 계정 환경 관리!

 

Hello.
 

 
I recently completed an interesting hands-on experience focused on resource configuration automation in a multi-account environment. The environment I set up is as follows:

  • Creating Auto Scaling Groups in multiple accounts within an Organizational Unit (OU) using an encrypted AMI created in the management account.

 
To configure this environment, the following steps are necessary:

  1. Create a Customer Managed KMS Key in the management account.
  2. Launch an EC2 instance in the management account with storage encrypted using the KMS key.
  3. Create an AMI from this instance and share the AMI and KMS key permissions with the Organizational Unit.
  4. In each account, create Auto Scaling Groups using the shared AMI.

 
If you only had to manage a couple of target accounts, you could quickly perform these tasks manually by logging into each account’s management console. But what if you had 100 target accounts? How would you handle this?
 
Here are some options:

  1. Automate the configuration using Terraform.
  2. Leverage your company’s master-slave structure to delegate commands.
  3. Use CloudFormation StackSets.

 
Option 1, using Terraform, would require obtaining credentials for all 100 accounts, which is a cumbersome process. Option 2 would reduce your own administrative overhead by passing it on to others 😅. However, today we’re going to explore how to efficiently solve this problem using AWS’s Infrastructure as Code (IaC) service: CloudFormation!
 
In this post, you will learn about the following:

  • CloudFormation StackSets
  • Centralizing CloudFormation StackSets Outputs & CloudFormation Lambda-backed Custom Resource
  • Creating Auto Scaling Groups with CloudFormation StackSets & Important Considerations

CloudFormation

AWS CloudFormation is a service that helps you model and set up your AWS resources so that you can spend less time managing those resources and more time focusing on your applications that run in AWS.

 

 

  • A CloudFormation template is a JSON or YAML formatted text file.
  • When you use CloudFormation, you manage related resources as a single unit called a stack.
    • All the resources in a stack are defined by the stack's CloudFormation template.

So, you can create, update, and delete stacks using templates. It looks like it’s very similar to Terraform. Well, they are not identical. There’s more that CloudFormation can do!
 


CloudFormation StackSets

A stack set lets you create stacks in AWS accounts across regions by using a single CloudFormation template.

 
You can create resources from your administrator account to multiple target accounts!

  • The administrator account is an AWS account that creates stack sets. The administrator account is either the organization's management account or a delegated administrator account.
  • Before you can use a stack set to create stacks in a target account, set up a trust relationship between the administrator and target accounts.

 
Setting up trust relationships

With service-managed permissions, you can deploy stack instances to accounts managed by AWS Organizations. Using this permissions model, you don't have to create the necessary IAM roles; StackSets creates the IAM roles on your behalf. With this model, you can also turn on automatic deployments to accounts that you add to your organization in the future.

 
Setting up the trust relationship requires interactions between administrator account and target account. But we will simplify our process by using service-managed permissions. With service-managed permissions, we already have trust relationship between all accounts and administrator account. So make sure to enable all features in your AWS Organizations (it’s enabled by default).
 
NOTE:
Enable all features in AWS Organizations. With only consolidated billing features enabled, you cannot create a stack set with service-managed permissions.
 


Create EC2 with shared AMI

 
In this chapter, we’ll be setting up the following resources. Let’s assume that the configuration for creating and sharing the AMI and KMS has already been completed. To deploy EC2 instances in each account, we’ll write a template as follows:
 

AWSTemplateFormatVersion: '2010-09-09'
Description: CloudFormation Template to create an EC2 instance using specified parameters for InstanceType and AMI ID and store outputs in a centralized S3 bucket.

Parameters:
  InstanceType:
    Description: EC2 Instance Type
    Type: String
    Default: t2.micro
    AllowedValues:
      - t2.micro
      - t3.micro
    ConstraintDescription: t2.micro or t3.micro only.

  ImageId:
    Description: AMI ID for the EC2 instance
    Type: AWS::EC2::Image::Id
    Default: ami-005b0bdedc5ed724b
    ConstraintDescription: must be a valid AMI ID.

Resources:
  MyEC2Instance:
    Type: AWS::EC2::Instance
    Properties: 
      InstanceType: !Ref InstanceType
      ImageId: !Ref ImageId
      SecurityGroups: 
        - Ref: MySecurityGroup

  MySecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Allow SSH and HTTP access
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 22
          ToPort: 22
          CidrIp: 0.0.0.0/0
        - IpProtocol: tcp
          FromPort: 80
          ToPort: 80
          CidrIp: 0.0.0.0/0

 
Here’s an explanation of the template:

  • The Parameters section defines the parameters you can input when running the template. By leveraging Parameters, you can significantly enhance the reusability of your template.
  • The Resources section defines the actual resources that CloudFormation will create.
    • MyEC2Instance is the EC2 instance resource we will create. By entering the AMI ID in the ImageId field, we can use the AMI that was previously shared with us.
    • MySecurityGroup is the Security Group that will be assigned to the EC2 instance. Since this is a resource created for testing purposes, the permissions are kept open.

Now, let’s create Stack Sets using the CloudFormation template. This is where you can see the powerful advantages of CloudFormation. Since you’re working from the management account, you can use the default AWS managed permission. This allows you to gain the necessary permissions to create, modify, and delete resources across accounts within the organization without having to individually verify credentials for each account.
 
Applying the template is quite simple. Navigate to the CloudFormation StackSets section in the administrator account.
 

 
You can access the accounts within AWS Organizations using Service-managed permissions.
 

 
Upload the template file you have pre-prepared.
 

 
You can choose the target at the OU level. Additionally, you can select the regions where the Stack will be deployed. Although not detailed in this post, there are various options available.
 

 
During deployment, you can choose your options. You can deploy the Stack to multiple accounts in parallel, or sequentially one by one. Additionally, you can define failure tolerance settings to decide whether to roll back the entire operation if a certain number of Stacks fail.
 

 
The resources are created! 👏👏
 
But there’s still an issue you need to address. In this scenario, we successfully created EC2 instances across AWS accounts, but you don’t yet know the endpoints of these instances. To check them, you would still need to manually log into each AWS account to retrieve the public IPv4 addresses. Let’s fix this!
 


Centralize CloudFormation StackSets Outputs

 
To obtain the Public IPv4 of the resources created by CloudFormation, we’ll use AWS Lambda functions. However, CloudFormation is primarily a provisioning tool. While it can create a Lambda function, how do you execute it?

Lambda-backed CloudFormation Custom Resources

When you associate a Lambda function with a custom resource, the function is invoked whenever the custom resource is created, updated, or deleted. CloudFormation calls a Lambda API to invoke the function and to pass all the request data (such as the request type and resource properties) to the function.

 
You can use a Lambda-backed CloudFormation Custom Resource to trigger the Lambda function during Stack deployment! With this, you can perform a wide range of tasks, such as using the AWS SDK, logging, and more. This is one of CloudFormation’s powerful features.
 
So, let’s implement the setup as discussed. First, create an S3 Bucket and assign a resource-based policy (S3 bucket policy) to grant permissions to the Organization.
 

 
Similar to the KMS permission settings, you can allow only specific OUs by using a format like this:
 

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::centralized-cloudformation-stacksets-output-logs-488357298470/*",
            "Condition": {
                "StringEquals": {
                    "aws:PrincipalOrgID": "o-n0g4bvykxt"
                },
                "ForAnyValue:StringLike": {
                    "aws:PrincipalOrgPaths": "o-n0g4bvykxt/r-goqx/ou-goqx-c5la1jns/*"
                }
            }
        }
    ]
}

 
Now, create a Lambda function that logs information to this S3 bucket:
 

LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: S3WritePolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - s3:PutObject
                Resource: arn:aws:s3:::centralized-cloudformation-stacksets-output-logs-488357298470/*

  OutputToS3Function:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.handler
      Runtime: python3.9
      Role: !GetAtt LambdaExecutionRole.Arn
      Code:
        ZipFile: |
          import json
          import boto3
          import urllib3

          s3_client = boto3.client('s3')
          http = urllib3.PoolManager()

          def handler(event, context):
              try:
                  bucket_name = 'centralized-cloudformation-stacksets-output-logs-488357298470'
                  account_id = context.invoked_function_arn.split(":")[4]
                  outputs = {
                      'InstancePublicIP': event['ResourceProperties']['InstancePublicIP'],
                      'InstanceId': event['ResourceProperties']['InstanceId']
                  }

                  # Store in S3
                  s3_client.put_object(
                      Bucket=bucket_name,
                      Key=f"outputs/{account_id}-outputs.json",
                      Body=json.dumps(outputs)
                  )

                  # Response for CF
                  response_data = {
                      'Status': 'SUCCESS',
                      'PhysicalResourceId': context.log_stream_name,
                      'StackId': event['StackId'],
                      'RequestId': event['RequestId'],
                      'LogicalResourceId': event['LogicalResourceId'],
                      'Data': outputs
                  }

			  # In case of failure
              except Exception as e:
                  response_data = {
                      'Status': 'FAILED',
                      'Reason': str(e),
                      'PhysicalResourceId': context.log_stream_name,
                      'StackId': event['StackId'],
                      'RequestId': event['RequestId'],
                      'LogicalResourceId': event['LogicalResourceId']
                  }

              # Send response back to CF
              response_url = event['ResponseURL']
              http.request('PUT', response_url, body=json.dumps(response_data))

              return {
                  'statusCode': 200,
                  'body': json.dumps(response_data)
              }

  MyCustomResource:
    Type: Custom::OutputToS3
    Properties:
      ServiceToken: !GetAtt OutputToS3Function.Arn
      InstancePublicIP: !GetAtt MyEC2Instance.PublicIp
      InstanceId: !Ref MyEC2Instance
    DependsOn: OutputToS3Function
    DeletionPolicy: Retain

 
MyCustomResource triggers the OutputToS3Function Lambda function. It passes along the created InstancePublicIP and InstanceId as properties so that the Lambda function can use them when logging to the S3 bucket.
 
After running, the Lambda function must notify CloudFormation that it has completed its execution. This is done using the http.request() function to send the information back to CloudFormation. Without this step, your CloudFormation Stack would remain stuck in the CREATE_IN_PROGRESS state.
 

 
As a result of the execution, output logs are now accumulated in the centralized account! This allows you to view the metadata for each instance. 👍
 


Creating Auto Scaling Groups with CloudFormation StackSets

 
Now, let’s create an Auto Scaling Group in each account using the encrypted AMI. But there’s a challenge.

  1. You need to use the AWS CLI to create a grant.
    • To use the ASG, you must grant the KMS key to the Service-linked Role.
    • For the ASG to access the AMI, it must be granted permission to use the KMS key.
    aws kms create-grant --key-id [KEY_ID] \\
    --grantee-principal [SLR ARN] \\
    --operations Decrypt Encrypt GenerateDataKey GenerateDataKeyWithoutPlaintext \\
    ReEncryptFrom ReEncryptTo CreateGrant DescribeKey
    
    • KEY_ID refers to the ID of the shared KMS Key.
    • SLR ARN refers to the IAM Role used by the Auto Scaling Group to scale EC2 instances. However, this brings up another problem:
  2. The grantee-principal is the SLR (Service Linked Role), but accounts that have never created an ASG won’t have an ASG SLR.
    • Therefore, you must manually create the Service-linked Role for the Auto Scaling Group using the AWS CLI or another method.
    • Fortunately, there is an AWS CLI command to create an SLR.
    create-service-linked-role --aws-service-name autoscaling.amazonaws.com
    

 
We’ll solve these issues using a Lambda function with the AWS SDK (boto3).
 

Lambda Function to create Auto Scaling Group Service Linked Role

import boto3
import json
import urllib3

iam_client = boto3.client('iam')
http = urllib3.PoolManager()

def handler(event, context):
    try:
        try:
            iam_client.get_role(
                RoleName='AWSServiceRoleForAutoScaling'
            )
            role_exists = True
        except iam_client.exceptions.NoSuchEntityException:
            role_exists = False

        if not role_exists:
            iam_client.create_service_linked_role(
                AWSServiceName='autoscaling.amazonaws.com'
            )

        response_data = {
            'Status': 'SUCCESS',
            'PhysicalResourceId': context.log_stream_name,
            'StackId': event['StackId'],
            'RequestId': event['RequestId'],
            'LogicalResourceId': event['LogicalResourceId'],
            'Data': {'Message': 'Service-linked role created successfully or already exists'}
        }

    except Exception as e:
        response_data = {
            'Status': 'FAILED',
            'Reason': str(e),
            'PhysicalResourceId': context.log_stream_name,
            'StackId': event['StackId'],
            'RequestId': event['RequestId'],
            'LogicalResourceId': event['LogicalResourceId']
        }

    response_url = event['ResponseURL']
    http.request('PUT', response_url, body=json.dumps(response_data))

    return {
        'statusCode': 200,
        'body': json.dumps(response_data)
    }

 

Lambda Function to grant KMS key

import boto3
import json
import urllib3
import time

kms_client = boto3.client('kms')
http = urllib3.PoolManager()

def handler(event, context):
    key_id = event['ResourceProperties']['KeyId']
    slr_arn = 'arn:aws:iam::' + context.invoked_function_arn.split(":")[4] + ':role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling'

    time.sleep(5)

    try:
        kms_client.create_grant(
            KeyId=key_id,
            GranteePrincipal=slr_arn,
            Operations=[
                'Decrypt', 'Encrypt', 'GenerateDataKey',
                'GenerateDataKeyWithoutPlaintext', 'ReEncryptFrom',
                'ReEncryptTo', 'CreateGrant', 'DescribeKey'
            ]
        )

        response_data = {
            'Status': 'SUCCESS',
            'PhysicalResourceId': context.log_stream_name,
            'StackId': event['StackId'],
            'RequestId': event['RequestId'],
            'LogicalResourceId': event['LogicalResourceId'],
            'Data': {'Message': 'KMS grant created successfully'}
        }

    except Exception as e:
        response_data = {
            'Status': 'FAILED',
            'Reason': str(e),
            'PhysicalResourceId': context.log_stream_name,
            'StackId': event['StackId'],
            'RequestId': event['RequestId'],
            'LogicalResourceId': event['LogicalResourceId']
        }

    response_url = event['ResponseURL']
    http.request('PUT', response_url, body=json.dumps(response_data))

    return {
        'statusCode': 200,
        'body': json.dumps(response_data)
    }

 
In this function, boto3 is used to assign the KMS key to the SLR. You’ll notice a time.sleep(5) command in the code. This is necessary because it takes a few seconds for the SLR to be created and become available. If you execute the command immediately, there’s a high chance you’ll encounter an error indicating that the SLR ARN is invalid.
 
Therefore, we added a 5-second delay and extended the Lambda timeout from the default 3 seconds to 10 seconds to prevent the function from timing out.
 

Custom Resources & ASG Config

  InvokeCreateSLRFunction:
    Type: Custom::CreateResources
    Properties:
      ServiceToken: !GetAtt CreateSLRFunction.Arn
    DeletionPolicy: Retain

  InvokeCreateKMSGrantFunction:
    Type: Custom::CreateKMSGrant
    Properties:
      ServiceToken: !GetAtt CreateKMSGrantFunction.Arn
      KeyId: !Ref KeyId
    DependsOn: InvokeCreateSLRFunction
    DeletionPolicy: Retain

  MyLaunchTemplate:
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateData:
        ImageId: !Ref ImageId
        InstanceType: !Ref InstanceType
        SecurityGroups:
          - !Ref MySecurityGroup
    DependsOn: [InvokeCreateSLRFunction, InvokeCreateKMSGrantFunction]

 
One important aspect to pay attention to is the DependsOn attribute. It helps control the order in which resources are created.
 

 
The creation order of resources should be as follows: After the SLR is created, the KMS Grant is applied using the SLR, and then the ASG is created with the granted permissions. The DependsOn field is used to specify this creation order.
 

Entire code:

AWSTemplateFormatVersion: '2010-09-09'
Description: CloudFormation Template to create an ASG with encrypted AMI.

Parameters:
  InstanceType:
    Description: EC2 Instance Type
    Type: String
    Default: t2.micro
    AllowedValues:
      - t2.micro
      - t3.micro
    ConstraintDescription: t2.micro or t3.micro only.

  ImageId:
    Description: Encrypted AMI ID for the EC2 instances
    Type: AWS::EC2::Image::Id
    Default: ami-005b0bdedc5ed724b
    ConstraintDescription: must be a valid AMI ID.

  KeyId:
    Description: KMS Key ID for the encrypted AMI
    Type: String
    Default: arn:aws:kms:ap-northeast-2:211125378002:key/926dcb9b-382a-4a37-b9dc-d287d3a714d5
    ConstraintDescription: must be a valid KMS Key ID.

Resources:
  LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: AllowCreateServiceLinkedRoleAndKMSGrant
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - iam:CreateServiceLinkedRole
                  - iam:GetRole
                  - kms:CreateGrant
                Resource: "*"

  # create SLR
  CreateSLRFunction:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.handler
      Runtime: python3.9
      Role: !GetAtt LambdaExecutionRole.Arn
      Code:
        ZipFile: |
          import boto3
          import json
          import urllib3

          iam_client = boto3.client('iam')
          http = urllib3.PoolManager()

          def handler(event, context):

              try:
                  try:
                      iam_client.get_role(
                          RoleName='AWSServiceRoleForAutoScaling'
                      )
                      role_exists = True
                  except iam_client.exceptions.NoSuchEntityException:
                      role_exists = False

                  if not role_exists:
                      iam_client.create_service_linked_role(
                          AWSServiceName='autoscaling.amazonaws.com'
                      )

                  response_data = {
                      'Status': 'SUCCESS',
                      'PhysicalResourceId': context.log_stream_name,
                      'StackId': event['StackId'],
                      'RequestId': event['RequestId'],
                      'LogicalResourceId': event['LogicalResourceId'],
                      'Data': {'Message': 'Service-linked role created successfully or already exists'}
                  }

              except Exception as e:
                  response_data = {
                      'Status': 'FAILED',
                      'Reason': str(e),
                      'PhysicalResourceId': context.log_stream_name,
                      'StackId': event['StackId'],
                      'RequestId': event['RequestId'],
                      'LogicalResourceId': event['LogicalResourceId']
                  }

              response_url = event['ResponseURL']
              http.request('PUT', response_url, body=json.dumps(response_data))

              return {
                  'statusCode': 200,
                  'body': json.dumps(response_data)
              }



  # create KMS grant
  CreateKMSGrantFunction:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.handler
      Runtime: python3.9
      Role: !GetAtt LambdaExecutionRole.Arn
      Timeout: 10
      Code:
        ZipFile: |
          import boto3
          import json
          import urllib3
          import time

          kms_client = boto3.client('kms')
          http = urllib3.PoolManager()

          def handler(event, context):
              key_id = event['ResourceProperties']['KeyId']
              slr_arn = 'arn:aws:iam::' + context.invoked_function_arn.split(":")[4] + ':role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling'

              time.sleep(5)

              try:
                  kms_client.create_grant(
                      KeyId=key_id,
                      GranteePrincipal=slr_arn,
                      Operations=[
                          'Decrypt', 'Encrypt', 'GenerateDataKey',
                          'GenerateDataKeyWithoutPlaintext', 'ReEncryptFrom',
                          'ReEncryptTo', 'CreateGrant', 'DescribeKey'
                      ]
                  )

                  # Prepare response for CloudFormation
                  response_data = {
                      'Status': 'SUCCESS',
                      'PhysicalResourceId': context.log_stream_name,
                      'StackId': event['StackId'],
                      'RequestId': event['RequestId'],
                      'LogicalResourceId': event['LogicalResourceId'],
                      'Data': {'Message': 'KMS grant created successfully'}
                  }

              except Exception as e:
                  response_data = {
                      'Status': 'FAILED',
                      'Reason': str(e),
                      'PhysicalResourceId': context.log_stream_name,
                      'StackId': event['StackId'],
                      'RequestId': event['RequestId'],
                      'LogicalResourceId': event['LogicalResourceId']
                  }

              # Send response back to CloudFormation
              response_url = event['ResponseURL']
              http.request('PUT', response_url, body=json.dumps(response_data))

              return {
                  'statusCode': 200,
                  'body': json.dumps(response_data)
              }


  InvokeCreateSLRFunction:
    Type: Custom::CreateResources
    Properties:
      ServiceToken: !GetAtt CreateSLRFunction.Arn
    DeletionPolicy: Retain

  InvokeCreateKMSGrantFunction:
    Type: Custom::CreateKMSGrant
    Properties:
      ServiceToken: !GetAtt CreateKMSGrantFunction.Arn
      KeyId: !Ref KeyId
    DependsOn: InvokeCreateSLRFunction
    DeletionPolicy: Retain

  MyLaunchTemplate:
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateData:
        ImageId: !Ref ImageId
        InstanceType: !Ref InstanceType
        SecurityGroups:
          - !Ref MySecurityGroup
    DependsOn: [InvokeCreateSLRFunction, InvokeCreateKMSGrantFunction]

  MyAutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      LaunchTemplate:
        LaunchTemplateId: !Ref MyLaunchTemplate
        Version: !GetAtt MyLaunchTemplate.LatestVersionNumber
      MinSize: '1'
      MaxSize: '3'
      DesiredCapacity: '1'
      AvailabilityZones:
        - !Select
          - '0'
          - !GetAZs ''
        

  MySecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Allow SSH and HTTP access
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 22
          ToPort: 22
          CidrIp: 0.0.0.0/0
        - IpProtocol: tcp
          FromPort: 80
          ToPort: 80
          CidrIp: 0.0.0.0/0

Outputs:
  AutoScalingGroupName:
    Description: Name of the Auto Scaling Group
    Value: !Ref MyAutoScalingGroup

 

 

 
Congratulations! You have successfully created a solution to automatically deploy ASGs into multiple AWS accounts! 🔥🔥
 
Working with CloudFormation StackSets requires great caution. A small mistake can result in the creation of numerous resources, leading to significant charges. It can also cause issues in the existing environments of accounts, so review is essential before applying any changes.
 
However, as seen in this post, CloudFormation is an incredibly powerful tool. When used with a solid understanding, it can be an invaluable resource for managing multi-account environments effectively.