Auto Scaling — Concept
What it is
Amazon EC2 Auto Scaling = automatically launches and terminates EC2 instances to match demand, while keeping a desired count of healthy instances across multiple AZs.
(Broader AWS Auto Scaling can scale ECS, DynamoDB, Aurora replicas, etc. via target tracking on those services.)
Why it exists
Manual sizing wastes money and risks outages. Auto Scaling ensures HA, elasticity, and cost optimization — pay only for what you need now.
Key building blocks
- Auto Scaling Group (ASG) — defines min/max/desired counts and the AZs/subnets to run in.
- Launch Template (preferred) or Launch Configuration (legacy) — describes the AMI, instance type, key, SGs, user-data, storage.
- Target Group (when behind an LB) — ASG registers/deregisters instances with the TG.
- Health checks — EC2 status (default) and/or ELB health checks; unhealthy → terminate + replace.
Scaling policies
| Policy | How |
|---|---|
| Manual | Adjust desired count by hand |
| Scheduled | Up/down at fixed times (e.g. business hours) |
| Dynamic — Target Tracking | "Keep CPU at 50 %" — AWS calculates the deltas |
| Dynamic — Step Scaling | Add N instances when alarm at X, more if X+Y |
| Dynamic — Simple Scaling | Single adjustment per alarm; cooldown |
| Predictive Scaling | ML predicts daily pattern, pre-scales |
Lifecycle hooks
- Pause an instance in
Pending:WaitorTerminating:Waitto run custom logic (warm up, drain cache, save logs). - Hook can call Lambda / SQS / SNS or be polled.
Termination policies & instance refresh
- Default: oldest LC/LT first, then closest to next billing hour, then random.
- Instance Refresh rolls a new launch template through the ASG (controlled %, warm-up).
Mixed instances policy
- Mix On-Demand + Spot in one ASG.
- Allocate across multiple instance types and purchase options.
- Maximize availability of Spot capacity.
Warm pools
- Pre-launched stopped instances ready to start quickly — useful when app boot time is long.
ELB / ALB integration
- ASG associates with one or more target groups.
- ELB health checks can be the source of truth (more accurate than EC2 status).
When to use vs alternatives
| Use ... | Instead of ... | When ... |
|---|---|---|
| ASG + ALB | Single EC2 | Always for production web tier |
| ASG with Spot | All On-Demand | Workload is fault-tolerant and stateless |
| Predictive scaling | Reactive | Predictable daily pattern |
| Lifecycle hooks | None | Need warm-up scripts or graceful drain |
| Warm pool | Cold launches | Boot time > a couple of minutes |
Common exam scenarios
- "Web tier must survive AZ failure and scale to load" → ASG across 2+ AZs behind ALB.
- "Save cost by mixing On-Demand + Spot in ASG" → mixed instances policy.
- "Stop scaling thrash on metric spikes" → use target tracking with reasonable warm-up, or step scaling with cooldown.
- "Drain in-flight requests before terminating" → deregistration delay on TG + lifecycle hook on terminate.
- "Predictable lunchtime spike each day" → scheduled or predictive scaling.
- "App takes 5 min to boot" → warm pool to reduce start latency.
- "Roll a new AMI through ASG safely" → Instance Refresh with desired-healthy-percentage.
Exam tip
"Highly available + scalable" almost always = ASG across multi-AZ behind ELB. If a question shows a single EC2 with elastic IP — that's a wrong design unless the question is testing limits.