Kinesis — Concept
What it is
Amazon Kinesis = AWS's streaming-data family. Four services:
- Kinesis Data Streams (KDS) — durable, replayable stream of records.
- Kinesis Data Firehose — managed delivery from stream to S3 / Redshift / OpenSearch / HTTP.
- Kinesis Data Analytics (now Managed Service for Apache Flink) — SQL / Flink streaming analytics.
- Kinesis Video Streams — video ingest from devices/cameras.
Why it exists
SQS is a queue; messages are consumed once and gone. Streaming use cases need:
- Ordered, replayable records.
- Multiple independent consumers reading the same data.
- Real-time analytics, anomaly detection, dashboards.
- Sub-second to seconds latency.
Kinesis Data Streams (KDS)
- Shards = scale unit. Each shard handles 1 MB/s in / 2 MB/s out (5 reads/s) classic, or use on-demand mode (auto-scale up to GB/s).
- Records: up to 1 MB each.
- Retention: 24 h default, up to 365 days.
- Consumers:
- Classic — share shard's 2 MB/s among all consumers.
- Enhanced Fan-Out (EFO) — each consumer gets dedicated 2 MB/s per shard, sub-second.
- Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent.
- Consumers: Kinesis Client Library (KCL), Lambda, Firehose, Flink.
Kinesis Data Firehose
- Managed delivery to S3 / Redshift / OpenSearch / Splunk / HTTP / Datadog / others.
- Near-real-time (buffer interval 0–900 s; buffer size 1–128 MB).
- Can transform data with Lambda before delivery.
- Can compress (GZIP / SNAPPY / ZSTD) and convert to Parquet/ORC on the fly.
- No shards; auto-scales.
- Charged per GB ingested.
Managed Service for Apache Flink (Kinesis Data Analytics)
- Stream processing with SQL or Apache Flink code.
- Reads from KDS / MSK; writes to KDS / Firehose / S3 / Lambda.
- Windowed aggregations, joins, anomaly detection, ML inference.
Kinesis Video Streams
- Ingest video / audio / time-series from devices.
- Integrates with Rekognition Video for real-time analytics.
Streams vs SQS vs MSK
| KDS | SQS | MSK (Kafka) | |
|---|---|---|---|
| Replay | ✓ | ✗ | ✓ |
| Ordering | per shard | per FIFO group | per partition |
| Multi-consumer same data | ✓ | ✗ | ✓ |
| Retention | up to 365 d | up to 14 d | configurable |
| Throughput unit | shard / on-demand | unlimited std | partitions |
| Use | analytics, IoT, fanout | decoupling, queues | open-source Kafka API |
When to use vs alternatives
| Use ... | Instead of ... | When ... |
|---|---|---|
| KDS | SQS | Real-time, multi-consumer, ordered, replayable |
| Firehose | KDS | You just want to load to S3 / Redshift / OpenSearch, no custom code |
| Managed Flink | Lambda processing | Continuous SQL/Flink stream analytics |
| MSK | KDS | You want native Kafka API / open source ecosystem |
| EventBridge | KDS | Event routing & filtering, not high-volume analytics |
Common exam scenarios
- "Ingest clickstream and write to S3 in Parquet" → Firehose with Lambda transform / format conversion.
- "Real-time anomaly detection on transaction stream" → KDS → Managed Flink → DDB/SNS.
- "Multiple analytics teams read the same stream" → KDS (with Enhanced Fan-Out per consumer).
- "Decouple producer + consumer where order isn't critical" → SQS, not Kinesis.
- "Stream video from cameras for ML" → Kinesis Video Streams + Rekognition.
Exam tip
- "Streaming analytics / multi-consumer / replay" → KDS.
- "Just deliver to S3 / Redshift / OpenSearch" → Firehose.
- "Decouple tiers, no replay needed" → SQS.
- "Real-time SQL on the stream" → Managed Flink.