☁︎SAA-C03

Kinesis

Kinesis — Concept

What it is

Amazon Kinesis = AWS's streaming-data family. Four services:

  • Kinesis Data Streams (KDS) — durable, replayable stream of records.
  • Kinesis Data Firehose — managed delivery from stream to S3 / Redshift / OpenSearch / HTTP.
  • Kinesis Data Analytics (now Managed Service for Apache Flink) — SQL / Flink streaming analytics.
  • Kinesis Video Streams — video ingest from devices/cameras.

Why it exists

SQS is a queue; messages are consumed once and gone. Streaming use cases need:

  • Ordered, replayable records.
  • Multiple independent consumers reading the same data.
  • Real-time analytics, anomaly detection, dashboards.
  • Sub-second to seconds latency.

Kinesis Data Streams (KDS)

  • Shards = scale unit. Each shard handles 1 MB/s in / 2 MB/s out (5 reads/s) classic, or use on-demand mode (auto-scale up to GB/s).
  • Records: up to 1 MB each.
  • Retention: 24 h default, up to 365 days.
  • Consumers:
    • Classic — share shard's 2 MB/s among all consumers.
    • Enhanced Fan-Out (EFO) — each consumer gets dedicated 2 MB/s per shard, sub-second.
  • Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent.
  • Consumers: Kinesis Client Library (KCL), Lambda, Firehose, Flink.

Kinesis Data Firehose

  • Managed delivery to S3 / Redshift / OpenSearch / Splunk / HTTP / Datadog / others.
  • Near-real-time (buffer interval 0–900 s; buffer size 1–128 MB).
  • Can transform data with Lambda before delivery.
  • Can compress (GZIP / SNAPPY / ZSTD) and convert to Parquet/ORC on the fly.
  • No shards; auto-scales.
  • Charged per GB ingested.

Managed Service for Apache Flink (Kinesis Data Analytics)

  • Stream processing with SQL or Apache Flink code.
  • Reads from KDS / MSK; writes to KDS / Firehose / S3 / Lambda.
  • Windowed aggregations, joins, anomaly detection, ML inference.

Kinesis Video Streams

  • Ingest video / audio / time-series from devices.
  • Integrates with Rekognition Video for real-time analytics.

Streams vs SQS vs MSK

KDSSQSMSK (Kafka)
Replay
Orderingper shardper FIFO groupper partition
Multi-consumer same data
Retentionup to 365 dup to 14 dconfigurable
Throughput unitshard / on-demandunlimited stdpartitions
Useanalytics, IoT, fanoutdecoupling, queuesopen-source Kafka API

When to use vs alternatives

Use ...Instead of ...When ...
KDSSQSReal-time, multi-consumer, ordered, replayable
FirehoseKDSYou just want to load to S3 / Redshift / OpenSearch, no custom code
Managed FlinkLambda processingContinuous SQL/Flink stream analytics
MSKKDSYou want native Kafka API / open source ecosystem
EventBridgeKDSEvent routing & filtering, not high-volume analytics

Common exam scenarios

  1. "Ingest clickstream and write to S3 in Parquet"Firehose with Lambda transform / format conversion.
  2. "Real-time anomaly detection on transaction stream"KDS → Managed Flink → DDB/SNS.
  3. "Multiple analytics teams read the same stream"KDS (with Enhanced Fan-Out per consumer).
  4. "Decouple producer + consumer where order isn't critical"SQS, not Kinesis.
  5. "Stream video from cameras for ML"Kinesis Video Streams + Rekognition.

Exam tip

  • "Streaming analytics / multi-consumer / replay"KDS.
  • "Just deliver to S3 / Redshift / OpenSearch"Firehose.
  • "Decouple tiers, no replay needed"SQS.
  • "Real-time SQL on the stream"Managed Flink.

References