Annyeong!
This post is a summary of the course -- AWS Certified DevOps Engineer Professional 2026 - DOP-C02
CloudWatch Metrics
Metric is a variable to monitor (CPUUtilization, NetworkIn, … )
- Metric belong to namespaces
- Dimension is an attribute of a metric, so it’s simillar to tags; up to 30 dimensions per metric are accepted
- Metrics have timestamps
- Can also create custom metrics
CloudWatch Metric Streams
Provides near-real-time delivery and low latency
- Metrics → Amazon Kinesis Data Firehose → Destinations
- Can send data to 3rd party service provider
- Can send them to S3, Redshift, OpenSearch, …
- Option to filter metrics to only stream a subset of them
CloudWatch Custom Metrics
- Example: RAM usage, disk usage, …
- Use API call PutMetricData
- Can specify metric resolution (StorageResolution API parameter - two possible value):
- Standard: 1 minute
- High resolution: 1 / 5 / 10 / 30 seconds (high cost)
💡 Accepts metric data points two weeks in the past and two hours in the future (make sure to configure your EC2 instance time correctly)
CloudWatch Anomaly Detection
- Continuously analyze metrics to determine normal baselines and surface anomalies using ML algorithms
- Allows you to create Alarms based on metric’s expected value (instead of static threshold)
- Ability to exclude specified time periods or events from being trained
CloudWatch Logs
- Log groups: arbitrary name, usually representing an application
- Log stream: instances within application / log files / containers
- Can define log expiration policies (never expire / 1 day to 10 years)
- CloudWatch Logs can send logs to:
- S3 (export them into batch), Kinesis Data Streams, Kinesis Data Firehose, Lambda, OpenSearch
- Logs are encrypted by default
- Can setup KMS-based encryption with your own keys
CloudWatch Logs - Sources
- SDK, CloudWatch Log Agent, CloudWatch Unified Agent
- Since the unified agent can send logs, using CloudWatch Log Agent is kinda deprecated
- Elastic Beanstalk: collection of logs from application
- ECS: collection from containers
- AWS Lambda: collection from function logs
- VPC Flow Logs: VPC specific logs
- API Gateway
- CloudTrail based on filter
- Route53: Log DNS queries
CloudWatch Logs Insights
- Search and analyze log data stored in CloudWatch Logs
- Provides a purpose-built query language
- Automatically discovers fields from AWS services and JSON log events
- Can query mutiple Log Groups in different AWS accounts
- It’s a qurey engine, not a real-time engine
CloudWatch Logs - S3 Export
- Log data can take up to 12 hours to become available for export
- The API call is CreateExportTask
- Not near-real time or real-time
CloudWatch Logs Subscriptions
- Get a real-time log events from CloudWatch Logs
- Send to Kinesis Data Streams, Kinesis Data Firehose, or Lambda
- Subscription Filter — filter which logs are events delivered to your destination
CloudWatch Logs → Subscription Filter → …
CloudWatch Logs Aggregation — Multi-Account & Multi Region
Logs from different sources → different subscription filters → Kinesis Data Streams → Kinesis Data Firehose —(near-real time)→ S3
- Cross-Account Subscription — send log events to resources in a different AWS account (KDS, KDF)
- Subscription Filter and Subscription Destination
- Then you need a Destination Access Policy and IAM Role to access the destination (Kinesis Data Streams, allow PutRecord)
- So the Subscription Filter assumes the IAM Role
CloudWatch Logs Metric Filter
💡 Filters do not retroactively filter data, Filters only publish the metric data points for events that happen after the filter was created.
- Ability to specify up to 3 dimensions for the metric filter (optional)
CloudWatch Logs Agent → CW logs → Metric Filters → CW alarm
All kind of logs
- Application logs
- Logs generated from the application
- Operating System Logs (event logs, system logs)
- Logs that are generated by your operating system (host)
- Informing you of system behavior (ex: /var/log/messages or /var/log/auth.log)
- Access Logs
- List of all the requests for individual files that people have requested from a website
- Example for httpd: /var/log/apache/access.log
- Usually for load balancers, proxies, web servers, etc
- AWS provides some access logs
- AWS Managed Logs
- Load Balancer Access Logs (ALB, NLB, CLB) → S3
- Access logs
- CloudTrail Logs → S3 / CloudWatch Logs
- Logs for API calls made within your account
- VPC Flow Logs → S3 / CloudWatch Logs
- Information about IP traffic going to and from ENI in your VPC
- Route 53 Access Logs → CloudWatch Logs
- Log information about the queries that Route 53 receives
- S3 Access Logs → S3
- Server access logging provides detailed records for the requests that are made to a bucket
- CloudFront Access Logs → S3
- Detailed information about every user request that CloudFront receives
- Load Balancer Access Logs (ALB, NLB, CLB) → S3
CloudWatch Logs for EC2
- By default, no logs from EC2 will go to CloudWatch
- You need to run a CloudWatch agent on EC2 to push the logs files you want
- Make sure IAM permissions are correct
- The CloudWatch log agent can be setup on-premise too
CloudWatch Logs Agent (legacy) & Unified Agent
- CloudWatch Logs Agent
- Old version of the agent
- Can only send to CloudWatch logs
- CloudWatch Unified Agent
- Collect additional system-level metrics such as RAM, processes, ..
- Collect logs to send to CloudWatch Logs
- Centralized configuration using SSM Parameter Store
CloudWatch Unified Agent — Metrics
- Collected directly on your Linux server / EC2 instance
- CPU (active / guest / idle / system / user / steal)
- Disk metrics (free, used, total), Disk IO (writes, reads, bytes, iops)
- RAM (free / inactive / used / total / cached)
- Netstat (number of TCP and UDP connections, net packets, bytes)
- Processes (total, dead, blocked, idle, running, sleep)
- Swap Space (free, used, used %)
💡 OOTB metrics for EC2 are disk, CPU, network (high level). If you need more, consider using CloudWatch Unified Agent
CloudWatch Alarms
- Alarms are used to trigger notifications for any metric
- Various options (sampling, %, max, min, etc)
- Alarm States:
- OK
- INSUFFICIENT_DATA
- ALARM
- Period:
- Length of time in seconds to evaluate the metric (evaluation window in Datadog)
- Can have a short length for high resolution custom metrics (like 10 sec, 30 sec or multiple of 60 sec)
CloudWatch Alarm Targets
- Stop, Terminate, Reboot, or Recover an EC2 instance
- Trigger Auto Scaling Action
- Send notification to SNS (from which you can do pretty much anything)
CloudWatch Alarms — Composite Alarms
- CloudWatch Alarms are on a single metric
- Composite Alarms are monitoring the states of multiple other alarms
- AND and OR conditions
- Helpful to reduce “alarm noise” by creating complex composite alarms
EC2 Instance Recovery
- Status Check:
- Instance status = check the EC2 VM
- System status = check the underlying hardware
- Attached EBS status = check attached EBS volumes
- You can create CloudWatch Alarms based on these checks (e.g. StatusCheckFailed_System) and do EC2 Instance Recovery when triggered
- Recovery will launch another EC2 instance with: same private / public / elastic IP, metadata, placement group
CloudWatch Alarm — good to know
- Alarms can be created based on CloudWatch Logs Metrics Filters
- To test alarms and notifications, set the alarm state to Alarm using CLI
aws cloudwatch set-alarm-state \
--alarm-name "<alarm-name>" \
--state-value ALARM \
--state-reason "<message>"
CloudWatch Synthetics Canary
- Configurable script that monitor your APIs, URLs, Websites, …
- Reproduce what your customers do programmatically to find issues before customers are impacted
- Checks the availability and latency of your endpoints and can store load time data and screenshots of the UI
- Integration with CloudWatch Alarms
- Scripts written in Node.js or Python
- Programmatic access to a headless Google Chrome browser
- Can run once or on a regular schedule
CloudWatch Synthetics Canary Blueprints (templates)
- Heartbeat Monitor — load URL, store screenshot and an HTTP archive file
- API Canary — test basic read and write functions of REST APIs
- Broken Link Checker — check all links inside the URL that you are testing
- Visual Monitoring — compare a screenshot taken during a canary run with a baseline screenshot
- Canary Recorder — used with CloudWatch Synthetics Recorder (record your actions on a website and automatically generates a script for that)
- GUI Workflow Builder — verifies that actions can be taken on your webpage (e.g. test a webpage with a login form)
Amazon Athena
- Serverless query service to analyze data stored in Amazon S3
- Uses standard SQL language to query the files (built on Presto engine which is SQL language)
- Supports CSV,JSON, ORC, Avro, and Parquet
- Pricing: $5.00 per TB of data scanned
- Commonly used with Amazon Quicksight for reporting / dashboards
- Use cases: business intelligence / analyics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudWatch trails, …
- Exam Tip: analyze data in S3 using serverless SQL, use Athena
Amazon Athena — Performance Improvement
Use columnar data for cost-savings (for less scan)
- Apache Parquet or ORC is recommended
- Huge performance improvement
- Use Glue to convert your data to Parquet or ORC
- Row-based format example:
user_id, user_name, country, purchase_amount, created_at
1, John, US, 53, 2024-01-01
2, Jane, KR, 20, 2024-01-02
...
- Columnar-based format example:
user_id:
1, 2, 3, ...
country:
US, KR, JP, ...
purchase_amount:
53, 20, 10, ...
Additional considerations:
- Compress data for smaller retrievals (bzip2, gzip, lz4, snappy, zlip, zstd, …)
- Partition datasets in S3 for easy querying on virtual columns
- example: s3://athena-examples/flight/parquet/year=1991/month=1/day=1/
- Use larger files (> 128 MB) to minimize overhead
Amazon Athena — Federated Query
- Allows you to run SQL queries across data stored in relational, non-relational, object, and custom data sources (AWS or on-premises)
- Uses Data Source Connectors that run on AWS Lambda to run Federated Queries (e.g. CloudWatch Logs, DynamoDB, RDS, ..)
- Store the results back in Amazon S3
'AWS' 카테고리의 다른 글
| AWS Cloud Club Hongik에서 실습 환경 구성하기 (1) | 2024.12.03 |
|---|---|
| [KO]CloudFormation StackSets를 이용한 다중 계정 환경 관리! (0) | 2024.08.22 |
| [EN]Multi-account environment with CloudFormation StackSets (0) | 2024.08.16 |
| AWS re:Post web crawler with AWS Serverless services (0) | 2024.08.08 |
| Terraform의 aws_iam_openid_connect_provider에 관하여 (0) | 2024.07.18 |