High-Performing Data Ingestion and Transformation
Unlock the complete study guide + 1,040 practice questions across 16 full exams.
Bundled into the existing AWS Certified Solutions Architect – Associate premium course — no separate purchase.
Included in this chapter:
- Kinesis Data Streams: shard math, enhanced fan-out, retention tiers
- Kinesis Firehose: buffer / transformation / format conversion patterns
- Glue jobs: Spark vs Python, bookmarks, partition projection
- Athena cost optimization: partitioning + columnar formats + workgroups
Ingestion + transformation services
| Service | Type | Latency | Best for |
|---|---|---|---|
| Kinesis Data Streams | Stream | Sub-second | Ordered events with replay (24h–365d); custom consumers |
| Kinesis Firehose | Stream → store | 60+ s buffer | Fire-and-forget delivery to S3 / Redshift / OpenSearch / Splunk |
| MSK (Managed Kafka) | Stream | Sub-second | Kafka-compatible workloads; existing Kafka tooling |
| AWS Glue | Serverless ETL (Spark + Python) | Minutes | Recurring ETL; schema discovery via crawlers; Glue Data Catalog |
| EMR | Hadoop / Spark / Hive / Presto / Flink cluster | Minutes-hours | Large-scale data processing; non-Glue frameworks |
| Athena | Serverless SQL over S3 | Seconds-minutes | Ad-hoc SQL; per-TB-scanned billing |
| Redshift | Columnar MPP DW | Seconds-minutes | Petabyte analytics warehouse; BI dashboards |
| Lake Formation | Data lake governance | — | Centralized fine-grained access on S3 + Glue Catalog |
| DataSync / DMS / Snowball | Transfer | Hours-days | On-prem or other cloud → S3 migration |
Cheat sheet
Unlock with Premium — includes all practice exams and the complete study guide.
Also tested in
References
- Amazon Kinesis Data Streams
- Amazon Data Firehose (formerly Kinesis Data Firehose)
- Amazon MSK (Managed Streaming for Apache Kafka)
- What is AWS Glue
- Amazon EMR
- What is AWS Batch
- What is Amazon Athena
- AWS Glue components (Data Catalog)
- What is AWS Lake Formation
- Kinesis Data Streams scaling and quotas (shard limits)
- Kinesis Data Streams enhanced fan-out
- AWS Database Migration Service