CloudWatch — Concept
What it is
Amazon CloudWatch = AWS's native observability service: metrics, logs, alarms, dashboards, events (now mostly EventBridge), and synthetic monitoring.
Why it exists
Every production system needs centralized monitoring. CloudWatch is the default collection point for AWS resources and apps, and the trigger source for many automated responses.
Components
| Component | What it does |
|---|
| Metrics | Time-series data per namespace/dimension. Standard metrics for most services every 1–5 min; detailed monitoring every 1 min (EC2 paid). |
| Custom metrics | Push from apps via PutMetricData or the CloudWatch agent. High-resolution = 1-second granularity. |
| Logs | Log groups → log streams → events. Subscribe to Lambda/Firehose/Kinesis. Retention configurable per group. |
| Logs Insights | Query language for ad-hoc log analysis. |
| Alarms | Trigger on a metric threshold, ANOMALY_DETECTION, or composite (multiple alarms). Action: SNS, Auto Scaling, EC2 actions, SSM OpsItem. |
| Dashboards | Custom panels of metrics/logs. |
| Synthetics | Canary scripts hit URLs to detect outages. |
| RUM (Real User Monitoring) | Capture browser/JS perf data. |
| ServiceLens / Application Insights | Cross-service troubleshooting. |
| Container Insights | Metrics/logs for ECS / EKS. |
| Contributor Insights | Find "noisy neighbors" in logs/metrics. |
CloudWatch Agent
- Push OS-level metrics (memory, disk usage — not in default EC2 metrics) and logs to CloudWatch.
- Configure via SSM Parameter Store or local JSON.
EC2 metrics by default vs need-agent
| Default (per-minute or 5-min) | Needs Agent |
|---|
| CPUUtilization | Memory |
| NetworkIn/Out | Disk usage (used %) |
| DiskRead/WriteOps/Bytes | Custom app metrics |
| StatusCheck (system / instance) | Application logs |
Alarms
- 3 states: OK, ALARM, INSUFFICIENT_DATA.
- Periods (1 s high-res, or 10/30/60+ s).
- Evaluation periods × datapoints to alarm.
- Targets: SNS, Auto Scaling action, EC2 stop/terminate/reboot, SSM OpsItem.
- Composite alarms combine sub-alarms with AND/OR.
Logs
- Push from: SDK, CW Agent, Lambda extension, FireLens (ECS), Fluent Bit, third-party.
- Subscription filters → Kinesis / Firehose / Lambda for real-time processing.
- Metric filters convert log patterns into custom metrics (e.g. count
ERROR lines).
- Encryption (KMS), retention 1 day–10 years (or never expire).
Logs Insights example
fields @timestamp, @message
| filter @message like /ERROR/
| stats count() by bin(5m)
EventBridge (formerly CloudWatch Events)
- See
EventBridge/ notes. CloudWatch Events still appears in older exam phrasing — treat as EventBridge.
When to use vs alternatives
| Need | Use |
|---|
| OS-level + custom metrics from EC2 | CloudWatch Agent |
| Query logs ad-hoc | CloudWatch Logs Insights |
| Centralize logs from many accounts | Subscription filter → Kinesis Firehose → S3 (or cross-account log destination) |
| Auto-scale on custom metric | CloudWatch Alarm → ASG policy |
| Synthetic uptime checks | CloudWatch Synthetics |
| Detailed app tracing | AWS X-Ray (separate) |
| AWS API audit | CloudTrail (separate) |
Common exam scenarios
- "Monitor EC2 memory and disk usage" → install CloudWatch Agent (not in default metrics).
- "Trigger Lambda on a log pattern (
ERROR)" → Metric filter + alarm + SNS → Lambda, or Subscription filter directly to Lambda.
- "Auto-scale on application queue depth" → custom metric or ApproximateNumberOfMessagesVisible → alarm → ASG step policy.
- "Restart unhealthy EC2 automatically" → StatusCheckFailed alarm → EC2 action reboot/recover.
- "Quickly detect site outage from outside" → Synthetics canaries.
- "Cross-account central logging" → log destination in security account; cross-account subscription filters.
Exam tip
- EC2 default metrics don't include memory / disk-used % — agent required.
- Alarms can act directly on EC2 / ASG without Lambda glue.
- CloudWatch Logs ≠ CloudTrail — Logs is app/infra logs; CloudTrail is API audit.
References