CloudWatch — Concept

What it is

Amazon CloudWatch = AWS's native observability service: metrics, logs, alarms, dashboards, events (now mostly EventBridge), and synthetic monitoring.

Why it exists

Every production system needs centralized monitoring. CloudWatch is the default collection point for AWS resources and apps, and the trigger source for many automated responses.

Components

Component	What it does
Metrics	Time-series data per namespace/dimension. Standard metrics for most services every 1–5 min; detailed monitoring every 1 min (EC2 paid).
Custom metrics	Push from apps via `PutMetricData` or the CloudWatch agent. High-resolution = 1-second granularity.
Logs	Log groups → log streams → events. Subscribe to Lambda/Firehose/Kinesis. Retention configurable per group.
Logs Insights	Query language for ad-hoc log analysis.
Alarms	Trigger on a metric threshold, ANOMALY_DETECTION, or composite (multiple alarms). Action: SNS, Auto Scaling, EC2 actions, SSM OpsItem.
Dashboards	Custom panels of metrics/logs.
Synthetics	Canary scripts hit URLs to detect outages.
RUM (Real User Monitoring)	Capture browser/JS perf data.
ServiceLens / Application Insights	Cross-service troubleshooting.
Container Insights	Metrics/logs for ECS / EKS.
Contributor Insights	Find "noisy neighbors" in logs/metrics.

CloudWatch Agent

Push OS-level metrics (memory, disk usage — not in default EC2 metrics) and logs to CloudWatch.
Configure via SSM Parameter Store or local JSON.

EC2 metrics by default vs need-agent

Default (per-minute or 5-min)	Needs Agent
CPUUtilization	Memory
NetworkIn/Out	Disk usage (used %)
DiskRead/WriteOps/Bytes	Custom app metrics
StatusCheck (system / instance)	Application logs

Alarms

3 states: OK, ALARM, INSUFFICIENT_DATA.
Periods (1 s high-res, or 10/30/60+ s).
Evaluation periods × datapoints to alarm.
Targets: SNS, Auto Scaling action, EC2 stop/terminate/reboot, SSM OpsItem.
Composite alarms combine sub-alarms with AND/OR.

Logs

Push from: SDK, CW Agent, Lambda extension, FireLens (ECS), Fluent Bit, third-party.
Subscription filters → Kinesis / Firehose / Lambda for real-time processing.
Metric filters convert log patterns into custom metrics (e.g. count ERROR lines).
Encryption (KMS), retention 1 day–10 years (or never expire).

Logs Insights example

fields @timestamp, @message
| filter @message like /ERROR/
| stats count() by bin(5m)

EventBridge (formerly CloudWatch Events)

See EventBridge/ notes. CloudWatch Events still appears in older exam phrasing — treat as EventBridge.

When to use vs alternatives

Need	Use
OS-level + custom metrics from EC2	CloudWatch Agent
Query logs ad-hoc	CloudWatch Logs Insights
Centralize logs from many accounts	Subscription filter → Kinesis Firehose → S3 (or cross-account log destination)
Auto-scale on custom metric	CloudWatch Alarm → ASG policy
Synthetic uptime checks	CloudWatch Synthetics
Detailed app tracing	AWS X-Ray (separate)
AWS API audit	CloudTrail (separate)

Common exam scenarios

"Monitor EC2 memory and disk usage" → install CloudWatch Agent (not in default metrics).
"Trigger Lambda on a log pattern (ERROR)" → Metric filter + alarm + SNS → Lambda, or Subscription filter directly to Lambda.
"Auto-scale on application queue depth" → custom metric or ApproximateNumberOfMessagesVisible → alarm → ASG step policy.
"Restart unhealthy EC2 automatically" → StatusCheckFailed alarm → EC2 action reboot/recover.
"Quickly detect site outage from outside" → Synthetics canaries.
"Cross-account central logging" → log destination in security account; cross-account subscription filters.

Exam tip

EC2 default metrics don't include memory / disk-used % — agent required.
Alarms can act directly on EC2 / ASG without Lambda glue.
CloudWatch Logs ≠ CloudTrail — Logs is app/infra logs; CloudTrail is API audit.

References

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/