SQS — Concept
What it is
Amazon Simple Queue Service (SQS) = fully managed, distributed message queue. Producers send messages; consumers poll and process them. Decouples producers from consumers and provides durability and elasticity.
Why it exists
Direct producer→consumer coupling causes outages and slowness when consumers can't keep up. A queue absorbs bursts, smooths load, retries on failure, and lets you scale tiers independently.
Two queue types
| Standard | FIFO | |
|---|---|---|
| Ordering | best-effort | Strict ordering within a Message Group ID |
| Delivery | At-least-once (rare duplicates) | Exactly-once within 5-minute dedup window |
| Throughput | nearly unlimited | 300 msg/s (3,000 with batching), or high throughput FIFO (~70 k/s with new mode) |
| Use | High-throughput, ordering not critical | Workflow steps that must happen in order |
Message lifecycle
- Producer sends message (up to 256 KB; larger via SQS Extended Client + S3 pointer, max 2 GB).
- Message sits in queue (retention default 4 days, max 14 days).
- Consumer polls (
ReceiveMessage, up to 10 at a time). - Message becomes invisible for a visibility timeout (default 30 s, up to 12 h).
- Consumer processes and DeleteMessage.
- If not deleted in time → message reappears for retry.
Polling
- Short polling = returns immediately even if empty (more API calls).
- Long polling (1–20 s
WaitTimeSeconds) = reduces empty responses, lower cost.
DLQ (Dead-Letter Queue)
- After
maxReceiveCountfailures, move message to DLQ for inspection. - DLQ is just another queue; create a redrive policy on the source.
Producers / consumers
- Any AWS SDK, Lambda (via event source mapping), ECS, EC2.
- Lambda + SQS: Lambda scales consumers up to default 60 functions/min for standard queue, FIFO respects ordering per group ID.
Security
- Resource-based queue policy (cross-account / SNS subscription).
- KMS encryption at rest.
- VPC endpoint (interface).
Common patterns
- Decouple producer/consumer for elasticity.
- Buffer between tiers (web → workers).
- Fanout with SNS → multiple SQS subscribers.
- Order processing with FIFO + group ID.
- Failure handling with DLQ + CloudWatch alarm.
When to use vs alternatives
| Use ... | Instead of ... | When ... |
|---|---|---|
| SQS Standard | Kinesis | Simple work queue, no ordering, many consumers |
| SQS FIFO | Standard | Strict order or exactly-once needed |
| Kinesis Data Streams | SQS | Real-time analytics, multiple consumers reading same stream |
| SNS + SQS fanout | Multiple direct SNS subscribers | Reliable fanout, replay safety |
| EventBridge | SQS | Event routing & filtering across many AWS services |
Common exam scenarios
- "Decouple a web tier from worker tier" → Web pushes to SQS, ASG of workers polls.
- "Messages must be processed in strict order per customer" → FIFO with
MessageGroupId = customerId. - "Avoid duplicates" → FIFO with deduplication ID or content-based dedup.
- "Capture failed messages for analysis" → DLQ with
maxReceiveCount. - "Fanout one event to 3 different services" → SNS topic with 3 SQS subscribers.
- "Slow consumer drops messages" → increase visibility timeout, increase retention, or add more workers.
- "Reduce empty receive cost" → enable long polling.
Exam tip
- "Decouple" + "queue" → SQS.
- "Order" / "exactly once" → FIFO.
- "Stream" / "replay" / "multiple consumers reading same data" → Kinesis, not SQS.
- "Fanout to many subscribers" → SNS (often → SQS).