[KO]CloudFormation StackSets를 이용한 다중 계정 환경 관리!
Hello.
I recently completed an interesting hands-on experience focused on resource configuration automation in a multi-account environment. The environment I set up is as follows:
- Creating Auto Scaling Groups in multiple accounts within an Organizational Unit (OU) using an encrypted AMI created in the management account.
To configure this environment, the following steps are necessary:
- Create a Customer Managed KMS Key in the management account.
- Launch an EC2 instance in the management account with storage encrypted using the KMS key.
- Create an AMI from this instance and share the AMI and KMS key permissions with the Organizational Unit.
- In each account, create Auto Scaling Groups using the shared AMI.
If you only had to manage a couple of target accounts, you could quickly perform these tasks manually by logging into each account’s management console. But what if you had 100 target accounts? How would you handle this?
Here are some options:
- Automate the configuration using Terraform.
- Leverage your company’s master-slave structure to delegate commands.
- Use CloudFormation StackSets.
Option 1, using Terraform, would require obtaining credentials for all 100 accounts, which is a cumbersome process. Option 2 would reduce your own administrative overhead by passing it on to others 😅. However, today we’re going to explore how to efficiently solve this problem using AWS’s Infrastructure as Code (IaC) service: CloudFormation!
In this post, you will learn about the following:
- CloudFormation StackSets
- Centralizing CloudFormation StackSets Outputs & CloudFormation Lambda-backed Custom Resource
- Creating Auto Scaling Groups with CloudFormation StackSets & Important Considerations
CloudFormation
AWS CloudFormation is a service that helps you model and set up your AWS resources so that you can spend less time managing those resources and more time focusing on your applications that run in AWS.
- A CloudFormation template is a JSON or YAML formatted text file.
- When you use CloudFormation, you manage related resources as a single unit called a stack.
- All the resources in a stack are defined by the stack's CloudFormation template.
So, you can create, update, and delete stacks using templates. It looks like it’s very similar to Terraform. Well, they are not identical. There’s more that CloudFormation can do!
CloudFormation StackSets
A stack set lets you create stacks in AWS accounts across regions by using a single CloudFormation template.
You can create resources from your administrator account to multiple target accounts!
- The administrator account is an AWS account that creates stack sets. The administrator account is either the organization's management account or a delegated administrator account.
- Before you can use a stack set to create stacks in a target account, set up a trust relationship between the administrator and target accounts.
Setting up trust relationships
With service-managed permissions, you can deploy stack instances to accounts managed by AWS Organizations. Using this permissions model, you don't have to create the necessary IAM roles; StackSets creates the IAM roles on your behalf. With this model, you can also turn on automatic deployments to accounts that you add to your organization in the future.
Setting up the trust relationship requires interactions between administrator account and target account. But we will simplify our process by using service-managed permissions. With service-managed permissions, we already have trust relationship between all accounts and administrator account. So make sure to enable all features in your AWS Organizations (it’s enabled by default).
NOTE:
Enable all features in AWS Organizations. With only consolidated billing features enabled, you cannot create a stack set with service-managed permissions.
Create EC2 with shared AMI
In this chapter, we’ll be setting up the following resources. Let’s assume that the configuration for creating and sharing the AMI and KMS has already been completed. To deploy EC2 instances in each account, we’ll write a template as follows:
AWSTemplateFormatVersion: '2010-09-09'
Description: CloudFormation Template to create an EC2 instance using specified parameters for InstanceType and AMI ID and store outputs in a centralized S3 bucket.
Parameters:
InstanceType:
Description: EC2 Instance Type
Type: String
Default: t2.micro
AllowedValues:
- t2.micro
- t3.micro
ConstraintDescription: t2.micro or t3.micro only.
ImageId:
Description: AMI ID for the EC2 instance
Type: AWS::EC2::Image::Id
Default: ami-005b0bdedc5ed724b
ConstraintDescription: must be a valid AMI ID.
Resources:
MyEC2Instance:
Type: AWS::EC2::Instance
Properties:
InstanceType: !Ref InstanceType
ImageId: !Ref ImageId
SecurityGroups:
- Ref: MySecurityGroup
MySecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Allow SSH and HTTP access
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
Here’s an explanation of the template:
- The Parameters section defines the parameters you can input when running the template. By leveraging Parameters, you can significantly enhance the reusability of your template.
- The Resources section defines the actual resources that CloudFormation will create.
- MyEC2Instance is the EC2 instance resource we will create. By entering the AMI ID in the ImageId field, we can use the AMI that was previously shared with us.
- MySecurityGroup is the Security Group that will be assigned to the EC2 instance. Since this is a resource created for testing purposes, the permissions are kept open.
Now, let’s create Stack Sets using the CloudFormation template. This is where you can see the powerful advantages of CloudFormation. Since you’re working from the management account, you can use the default AWS managed permission. This allows you to gain the necessary permissions to create, modify, and delete resources across accounts within the organization without having to individually verify credentials for each account.
Applying the template is quite simple. Navigate to the CloudFormation StackSets section in the administrator account.
You can access the accounts within AWS Organizations using Service-managed permissions.
Upload the template file you have pre-prepared.
You can choose the target at the OU level. Additionally, you can select the regions where the Stack will be deployed. Although not detailed in this post, there are various options available.
During deployment, you can choose your options. You can deploy the Stack to multiple accounts in parallel, or sequentially one by one. Additionally, you can define failure tolerance settings to decide whether to roll back the entire operation if a certain number of Stacks fail.
The resources are created! 👏👏
But there’s still an issue you need to address. In this scenario, we successfully created EC2 instances across AWS accounts, but you don’t yet know the endpoints of these instances. To check them, you would still need to manually log into each AWS account to retrieve the public IPv4 addresses. Let’s fix this!
Centralize CloudFormation StackSets Outputs
To obtain the Public IPv4 of the resources created by CloudFormation, we’ll use AWS Lambda functions. However, CloudFormation is primarily a provisioning tool. While it can create a Lambda function, how do you execute it?
Lambda-backed CloudFormation Custom Resources
When you associate a Lambda function with a custom resource, the function is invoked whenever the custom resource is created, updated, or deleted. CloudFormation calls a Lambda API to invoke the function and to pass all the request data (such as the request type and resource properties) to the function.
You can use a Lambda-backed CloudFormation Custom Resource to trigger the Lambda function during Stack deployment! With this, you can perform a wide range of tasks, such as using the AWS SDK, logging, and more. This is one of CloudFormation’s powerful features.
So, let’s implement the setup as discussed. First, create an S3 Bucket and assign a resource-based policy (S3 bucket policy) to grant permissions to the Organization.
Similar to the KMS permission settings, you can allow only specific OUs by using a format like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::centralized-cloudformation-stacksets-output-logs-488357298470/*",
"Condition": {
"StringEquals": {
"aws:PrincipalOrgID": "o-n0g4bvykxt"
},
"ForAnyValue:StringLike": {
"aws:PrincipalOrgPaths": "o-n0g4bvykxt/r-goqx/ou-goqx-c5la1jns/*"
}
}
}
]
}
Now, create a Lambda function that logs information to this S3 bucket:
LambdaExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: S3WritePolicy
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- s3:PutObject
Resource: arn:aws:s3:::centralized-cloudformation-stacksets-output-logs-488357298470/*
OutputToS3Function:
Type: AWS::Lambda::Function
Properties:
Handler: index.handler
Runtime: python3.9
Role: !GetAtt LambdaExecutionRole.Arn
Code:
ZipFile: |
import json
import boto3
import urllib3
s3_client = boto3.client('s3')
http = urllib3.PoolManager()
def handler(event, context):
try:
bucket_name = 'centralized-cloudformation-stacksets-output-logs-488357298470'
account_id = context.invoked_function_arn.split(":")[4]
outputs = {
'InstancePublicIP': event['ResourceProperties']['InstancePublicIP'],
'InstanceId': event['ResourceProperties']['InstanceId']
}
# Store in S3
s3_client.put_object(
Bucket=bucket_name,
Key=f"outputs/{account_id}-outputs.json",
Body=json.dumps(outputs)
)
# Response for CF
response_data = {
'Status': 'SUCCESS',
'PhysicalResourceId': context.log_stream_name,
'StackId': event['StackId'],
'RequestId': event['RequestId'],
'LogicalResourceId': event['LogicalResourceId'],
'Data': outputs
}
# In case of failure
except Exception as e:
response_data = {
'Status': 'FAILED',
'Reason': str(e),
'PhysicalResourceId': context.log_stream_name,
'StackId': event['StackId'],
'RequestId': event['RequestId'],
'LogicalResourceId': event['LogicalResourceId']
}
# Send response back to CF
response_url = event['ResponseURL']
http.request('PUT', response_url, body=json.dumps(response_data))
return {
'statusCode': 200,
'body': json.dumps(response_data)
}
MyCustomResource:
Type: Custom::OutputToS3
Properties:
ServiceToken: !GetAtt OutputToS3Function.Arn
InstancePublicIP: !GetAtt MyEC2Instance.PublicIp
InstanceId: !Ref MyEC2Instance
DependsOn: OutputToS3Function
DeletionPolicy: Retain
MyCustomResource triggers the OutputToS3Function Lambda function. It passes along the created InstancePublicIP and InstanceId as properties so that the Lambda function can use them when logging to the S3 bucket.
After running, the Lambda function must notify CloudFormation that it has completed its execution. This is done using the http.request() function to send the information back to CloudFormation. Without this step, your CloudFormation Stack would remain stuck in the CREATE_IN_PROGRESS state.
As a result of the execution, output logs are now accumulated in the centralized account! This allows you to view the metadata for each instance. 👍
Creating Auto Scaling Groups with CloudFormation StackSets
Now, let’s create an Auto Scaling Group in each account using the encrypted AMI. But there’s a challenge.
- You need to use the AWS CLI to create a grant.
- To use the ASG, you must grant the KMS key to the Service-linked Role.
- For the ASG to access the AMI, it must be granted permission to use the KMS key.
aws kms create-grant --key-id [KEY_ID] \\ --grantee-principal [SLR ARN] \\ --operations Decrypt Encrypt GenerateDataKey GenerateDataKeyWithoutPlaintext \\ ReEncryptFrom ReEncryptTo CreateGrant DescribeKey
- KEY_ID refers to the ID of the shared KMS Key.
- SLR ARN refers to the IAM Role used by the Auto Scaling Group to scale EC2 instances. However, this brings up another problem:
- The grantee-principal is the SLR (Service Linked Role), but accounts that have never created an ASG won’t have an ASG SLR.
- Therefore, you must manually create the Service-linked Role for the Auto Scaling Group using the AWS CLI or another method.
- Fortunately, there is an AWS CLI command to create an SLR.
create-service-linked-role --aws-service-name autoscaling.amazonaws.com
We’ll solve these issues using a Lambda function with the AWS SDK (boto3).
Lambda Function to create Auto Scaling Group Service Linked Role
import boto3
import json
import urllib3
iam_client = boto3.client('iam')
http = urllib3.PoolManager()
def handler(event, context):
try:
try:
iam_client.get_role(
RoleName='AWSServiceRoleForAutoScaling'
)
role_exists = True
except iam_client.exceptions.NoSuchEntityException:
role_exists = False
if not role_exists:
iam_client.create_service_linked_role(
AWSServiceName='autoscaling.amazonaws.com'
)
response_data = {
'Status': 'SUCCESS',
'PhysicalResourceId': context.log_stream_name,
'StackId': event['StackId'],
'RequestId': event['RequestId'],
'LogicalResourceId': event['LogicalResourceId'],
'Data': {'Message': 'Service-linked role created successfully or already exists'}
}
except Exception as e:
response_data = {
'Status': 'FAILED',
'Reason': str(e),
'PhysicalResourceId': context.log_stream_name,
'StackId': event['StackId'],
'RequestId': event['RequestId'],
'LogicalResourceId': event['LogicalResourceId']
}
response_url = event['ResponseURL']
http.request('PUT', response_url, body=json.dumps(response_data))
return {
'statusCode': 200,
'body': json.dumps(response_data)
}
Lambda Function to grant KMS key
import boto3
import json
import urllib3
import time
kms_client = boto3.client('kms')
http = urllib3.PoolManager()
def handler(event, context):
key_id = event['ResourceProperties']['KeyId']
slr_arn = 'arn:aws:iam::' + context.invoked_function_arn.split(":")[4] + ':role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling'
time.sleep(5)
try:
kms_client.create_grant(
KeyId=key_id,
GranteePrincipal=slr_arn,
Operations=[
'Decrypt', 'Encrypt', 'GenerateDataKey',
'GenerateDataKeyWithoutPlaintext', 'ReEncryptFrom',
'ReEncryptTo', 'CreateGrant', 'DescribeKey'
]
)
response_data = {
'Status': 'SUCCESS',
'PhysicalResourceId': context.log_stream_name,
'StackId': event['StackId'],
'RequestId': event['RequestId'],
'LogicalResourceId': event['LogicalResourceId'],
'Data': {'Message': 'KMS grant created successfully'}
}
except Exception as e:
response_data = {
'Status': 'FAILED',
'Reason': str(e),
'PhysicalResourceId': context.log_stream_name,
'StackId': event['StackId'],
'RequestId': event['RequestId'],
'LogicalResourceId': event['LogicalResourceId']
}
response_url = event['ResponseURL']
http.request('PUT', response_url, body=json.dumps(response_data))
return {
'statusCode': 200,
'body': json.dumps(response_data)
}
In this function, boto3 is used to assign the KMS key to the SLR. You’ll notice a time.sleep(5) command in the code. This is necessary because it takes a few seconds for the SLR to be created and become available. If you execute the command immediately, there’s a high chance you’ll encounter an error indicating that the SLR ARN is invalid.
Therefore, we added a 5-second delay and extended the Lambda timeout from the default 3 seconds to 10 seconds to prevent the function from timing out.
Custom Resources & ASG Config
InvokeCreateSLRFunction:
Type: Custom::CreateResources
Properties:
ServiceToken: !GetAtt CreateSLRFunction.Arn
DeletionPolicy: Retain
InvokeCreateKMSGrantFunction:
Type: Custom::CreateKMSGrant
Properties:
ServiceToken: !GetAtt CreateKMSGrantFunction.Arn
KeyId: !Ref KeyId
DependsOn: InvokeCreateSLRFunction
DeletionPolicy: Retain
MyLaunchTemplate:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateData:
ImageId: !Ref ImageId
InstanceType: !Ref InstanceType
SecurityGroups:
- !Ref MySecurityGroup
DependsOn: [InvokeCreateSLRFunction, InvokeCreateKMSGrantFunction]
One important aspect to pay attention to is the DependsOn attribute. It helps control the order in which resources are created.
The creation order of resources should be as follows: After the SLR is created, the KMS Grant is applied using the SLR, and then the ASG is created with the granted permissions. The DependsOn field is used to specify this creation order.
Entire code:
AWSTemplateFormatVersion: '2010-09-09'
Description: CloudFormation Template to create an ASG with encrypted AMI.
Parameters:
InstanceType:
Description: EC2 Instance Type
Type: String
Default: t2.micro
AllowedValues:
- t2.micro
- t3.micro
ConstraintDescription: t2.micro or t3.micro only.
ImageId:
Description: Encrypted AMI ID for the EC2 instances
Type: AWS::EC2::Image::Id
Default: ami-005b0bdedc5ed724b
ConstraintDescription: must be a valid AMI ID.
KeyId:
Description: KMS Key ID for the encrypted AMI
Type: String
Default: arn:aws:kms:ap-northeast-2:211125378002:key/926dcb9b-382a-4a37-b9dc-d287d3a714d5
ConstraintDescription: must be a valid KMS Key ID.
Resources:
LambdaExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: AllowCreateServiceLinkedRoleAndKMSGrant
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- iam:CreateServiceLinkedRole
- iam:GetRole
- kms:CreateGrant
Resource: "*"
# create SLR
CreateSLRFunction:
Type: AWS::Lambda::Function
Properties:
Handler: index.handler
Runtime: python3.9
Role: !GetAtt LambdaExecutionRole.Arn
Code:
ZipFile: |
import boto3
import json
import urllib3
iam_client = boto3.client('iam')
http = urllib3.PoolManager()
def handler(event, context):
try:
try:
iam_client.get_role(
RoleName='AWSServiceRoleForAutoScaling'
)
role_exists = True
except iam_client.exceptions.NoSuchEntityException:
role_exists = False
if not role_exists:
iam_client.create_service_linked_role(
AWSServiceName='autoscaling.amazonaws.com'
)
response_data = {
'Status': 'SUCCESS',
'PhysicalResourceId': context.log_stream_name,
'StackId': event['StackId'],
'RequestId': event['RequestId'],
'LogicalResourceId': event['LogicalResourceId'],
'Data': {'Message': 'Service-linked role created successfully or already exists'}
}
except Exception as e:
response_data = {
'Status': 'FAILED',
'Reason': str(e),
'PhysicalResourceId': context.log_stream_name,
'StackId': event['StackId'],
'RequestId': event['RequestId'],
'LogicalResourceId': event['LogicalResourceId']
}
response_url = event['ResponseURL']
http.request('PUT', response_url, body=json.dumps(response_data))
return {
'statusCode': 200,
'body': json.dumps(response_data)
}
# create KMS grant
CreateKMSGrantFunction:
Type: AWS::Lambda::Function
Properties:
Handler: index.handler
Runtime: python3.9
Role: !GetAtt LambdaExecutionRole.Arn
Timeout: 10
Code:
ZipFile: |
import boto3
import json
import urllib3
import time
kms_client = boto3.client('kms')
http = urllib3.PoolManager()
def handler(event, context):
key_id = event['ResourceProperties']['KeyId']
slr_arn = 'arn:aws:iam::' + context.invoked_function_arn.split(":")[4] + ':role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling'
time.sleep(5)
try:
kms_client.create_grant(
KeyId=key_id,
GranteePrincipal=slr_arn,
Operations=[
'Decrypt', 'Encrypt', 'GenerateDataKey',
'GenerateDataKeyWithoutPlaintext', 'ReEncryptFrom',
'ReEncryptTo', 'CreateGrant', 'DescribeKey'
]
)
# Prepare response for CloudFormation
response_data = {
'Status': 'SUCCESS',
'PhysicalResourceId': context.log_stream_name,
'StackId': event['StackId'],
'RequestId': event['RequestId'],
'LogicalResourceId': event['LogicalResourceId'],
'Data': {'Message': 'KMS grant created successfully'}
}
except Exception as e:
response_data = {
'Status': 'FAILED',
'Reason': str(e),
'PhysicalResourceId': context.log_stream_name,
'StackId': event['StackId'],
'RequestId': event['RequestId'],
'LogicalResourceId': event['LogicalResourceId']
}
# Send response back to CloudFormation
response_url = event['ResponseURL']
http.request('PUT', response_url, body=json.dumps(response_data))
return {
'statusCode': 200,
'body': json.dumps(response_data)
}
InvokeCreateSLRFunction:
Type: Custom::CreateResources
Properties:
ServiceToken: !GetAtt CreateSLRFunction.Arn
DeletionPolicy: Retain
InvokeCreateKMSGrantFunction:
Type: Custom::CreateKMSGrant
Properties:
ServiceToken: !GetAtt CreateKMSGrantFunction.Arn
KeyId: !Ref KeyId
DependsOn: InvokeCreateSLRFunction
DeletionPolicy: Retain
MyLaunchTemplate:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateData:
ImageId: !Ref ImageId
InstanceType: !Ref InstanceType
SecurityGroups:
- !Ref MySecurityGroup
DependsOn: [InvokeCreateSLRFunction, InvokeCreateKMSGrantFunction]
MyAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
LaunchTemplate:
LaunchTemplateId: !Ref MyLaunchTemplate
Version: !GetAtt MyLaunchTemplate.LatestVersionNumber
MinSize: '1'
MaxSize: '3'
DesiredCapacity: '1'
AvailabilityZones:
- !Select
- '0'
- !GetAZs ''
MySecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Allow SSH and HTTP access
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
Outputs:
AutoScalingGroupName:
Description: Name of the Auto Scaling Group
Value: !Ref MyAutoScalingGroup
Congratulations! You have successfully created a solution to automatically deploy ASGs into multiple AWS accounts! 🔥🔥
Working with CloudFormation StackSets requires great caution. A small mistake can result in the creation of numerous resources, leading to significant charges. It can also cause issues in the existing environments of accounts, so review is essential before applying any changes.
However, as seen in this post, CloudFormation is an incredibly powerful tool. When used with a solid understanding, it can be an invaluable resource for managing multi-account environments effectively.
'AWS' 카테고리의 다른 글
AWS Cloud Club Hongik에서 실습 환경 구성하기 (1) | 2024.12.03 |
---|---|
[KO]CloudFormation StackSets를 이용한 다중 계정 환경 관리! (0) | 2024.08.22 |
AWS re:Post web crawler with AWS Serverless services (0) | 2024.08.08 |
Terraform의 aws_iam_openid_connect_provider에 관하여 (0) | 2024.07.18 |
[AWS IAM] Hands-on: Github Actions에 임시 자격 증명 사용하기 (0) | 2024.07.17 |