Domain 3 of 4 · Chapter 3 of 5

High-Performing Databases

Aurora vs RDS: feature-by-feature breakdown

Aurora is RDS's cloud-native cousin. MySQL- and PostgreSQL-compatible at the wire level, but the storage engine and infrastructure are entirely different. The exam tests when Aurora's features matter and when RDS suffices.

Storage architecture:

  • RDS: traditional engine with EBS storage. Multi-AZ = synchronous replication to a passive standby in another AZ (storage replication via the engine's binlog).
  • Aurora: distributed storage layer. Each write commits to 6 storage nodes across 3 AZs before acknowledging. Storage auto-scales 10 GB → 128 TiB. Replicas read from the shared storage (no log shipping).

HA and failover:

  • RDS Multi-AZ: 60-120 sec failover. Standby NOT readable.
  • RDS Multi-AZ DB cluster (newer for MySQL/PostgreSQL): 1 writer + 2 readable standbys across 3 AZs. ~35 sec failover.
  • Aurora: 1 writer + up to 15 readers. Reader can auto-promote on writer failure (~1 min). All readers are read-only but accessible.

Replication lag:

  • RDS Read Replicas: async; lag from seconds to minutes under heavy writes.
  • Aurora Replicas: shared storage means typical < 100 ms lag (often < 10 ms).

Storage features unique to Aurora:

  • Backtrack (MySQL only): rewind cluster up to 72 hours in-place (no restore). For "oops, dropped a table" scenarios.
  • Cloning: instant zero-copy clones using copy-on-write storage. Use for dev/test branches off production data.
  • Aurora Global Database: cross-region replication < 1 s RPO; managed promote ~1 min RTO.
  • Aurora Serverless v2: ACU-based autoscaling in 0.5-ACU increments (0.5 → 128 ACUs).
  • Parallel Query (MySQL): pushes WHERE/aggregation down to the storage layer for analytical queries on transactional data.
  • Aurora I/O-Optimized: switch billing modes — pay more for compute, zero per-I/O. Break-even at ~25% I/O of total Aurora cost.

Storage features unique to RDS:

  • Other database engines: Oracle, SQL Server, MariaDB — only RDS supports these. Aurora is MySQL/PostgreSQL only.
  • Stop/start instance (up to 7 days): pause RDS or Aurora — pay storage only.
  • Database-specific tuning knobs: more direct parameter-group control over engine internals.

Cost:

  • RDS: hourly instance + EBS storage + I/O charges.
  • Aurora Standard: hourly instance + storage + per-I/O charges.
  • Aurora I/O-Optimized: ~30% more compute cost; zero per-I/O.
  • Aurora Serverless v2: per-ACU-hour pricing.

Decision pattern:

  • New MySQL or PostgreSQL workload → Aurora (default).
  • Need Oracle / SQL Server / MariaDB → RDS.
  • Variable / unpredictable load → Aurora Serverless v2.
  • Need < 100 ms read replica lag → Aurora.
  • Need cross-region RPO < 1 s → Aurora Global Database.
  • Need backtrack / cloning → Aurora.
  • Tiny workload (< 1 vCPU) where Aurora minimum ($0.10/ACU-hr) is too high → RDS t3.micro etc.
  • Legacy app with specific Oracle/SQL features → RDS for those engines.

DynamoDB partition design: key shapes, GSI patterns, adaptive capacity

DynamoDB's performance depends entirely on partition key design. A poorly-distributed key creates hot partitions that throttle writes regardless of total provisioned capacity.

Partition basics:

  • Items physically distributed across partitions by hash(partition_key).
  • Each partition supports up to 3 000 RCU + 1 000 WCU (in provisioned mode).
  • A single hot key consuming > 1 000 WCU throttles even if the table has 100 000 WCU provisioned.

Adaptive capacity (since 2018):

  • DynamoDB auto-shifts capacity TO hot partitions FROM cold ones in the same table.
  • Works at the partition level, not the key level. Still bounded by per-partition limits.
  • Doesn't eliminate hot-key problems for low-cardinality partition keys.

Key cardinality matters:

  • High-cardinality partition key (UUIDs, customer IDs, order IDs): writes distribute evenly. No hot partition issues.
  • Low-cardinality key (e.g. "tenantType" with values "premium" and "basic"): all writes hit 2 partitions max. Throttled at 1 000 WCU each → 2 000 WCU table-wide ceiling.

Composite keys (the common pattern):

Use partition_key + sort_key. The partition key shards; the sort key orders within a partition.

  • Partition key = userId → all that user's items live in one partition.
  • Sort key = timestamp → query "most recent N items for this user" via Query with ScanIndexForward=false, Limit=N.

Write sharding for very hot logical keys:

If one user generates 10 000 writes/sec (above 1 000 WCU partition limit), shard the key:

  • Use userId#0..N as partition key (where N is a fixed bucket count).
  • Writes randomly pick a bucket; reads query all N partitions.
  • Trades write hot-partition for fan-out reads.

Global Secondary Indexes (GSI):

  • Index with a DIFFERENT partition key + (optional) sort key than the base table.
  • Eventually consistent reads (typically sub-second lag).
  • Has its own RCU/WCU provisioning (or on-demand) — separate from the base table.
  • Up to 20 GSIs per table.
  • Use for: query patterns that aren't supported by the base table's keys (e.g. base table is by orderId, GSI is by customerId to look up all orders for a customer).

Local Secondary Indexes (LSI):

  • Index with the SAME partition key as base table but different sort key.
  • Strongly consistent reads supported.
  • Must be created at table-creation time (can't add later).
  • Counts against the 10 GB per-partition-key item collection limit.
  • Rarely the right answer on the exam — usually GSI fits better.

DynamoDB Streams (change capture):

  • 24h retention.
  • Stream record options: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE, NEW_AND_OLD_IMAGES.
  • Consumer pattern: Lambda triggered by stream → react to changes (audit, propagate to OpenSearch, invalidate cache).
  • Enables DynamoDB Global Tables under the hood (multi-region multi-active).

TTL (time-to-live):

  • Auto-delete items after a timestamp (specified in an item attribute).
  • Deletes are best-effort (typically within 48h of TTL).
  • Free deletes (don't consume WCU).
  • Use for: session tables, time-limited tokens, log cleanup.

DAX setup: VPC config, cluster sizing, write-through semantics

DAX (DynamoDB Accelerator) is a write-through cache in front of DynamoDB. Reads through DAX = microseconds (vs single-digit ms direct). Sits in your VPC.

Why DAX over ElastiCache for DynamoDB caching:

  • DynamoDB-native API: code uses the standard DynamoDB SDK with the DAX endpoint. No application changes beyond endpoint swap.
  • Write-through built-in: writes go to DAX AND DynamoDB; cache stays consistent.
  • No cache-invalidation logic: DAX handles it.
  • vs ElastiCache: ElastiCache requires custom application code to fetch from DynamoDB on miss + populate cache + invalidate on writes. More flexible but more code.

Architecture:

  • DAX cluster = 1 primary node + 0-9 read replicas across multiple AZs.
  • Primary handles writes (write-through to DynamoDB); replicas serve reads.
  • Up to 10 nodes per cluster. Sizing example: 1 primary + 2 replicas across 3 AZs.

Cache modes:

  • Item cache: caches individual items by primary key. TTL configurable (default 5 min). LRU eviction.
  • Query cache: caches Query and Scan results. TTL configurable (default 5 min).

Eventual consistency by default — DAX reads return potentially stale data (from cache). For strongly-consistent reads, configure the request to bypass DAX and go directly to DynamoDB.

Setup steps:

  1. Create DAX cluster: pick node type (e.g. dax.r5.large), node count, VPC + subnets.
  2. Create a subnet group spanning multiple AZs.
  3. Create an IAM role for DAX nodes (allows access to underlying DynamoDB tables).
  4. Set up security group: allow port 8111 (DAX cluster comm).
  5. Update application: install DAX SDK; point DynamoDB client at DAX cluster endpoint.

Latency impact:

  • Cached read: ~1 ms (within VPC).
  • Cache miss: DAX fetches from DynamoDB (single-digit ms), populates cache, returns.
  • Cached write: write-through to DynamoDB + cache update (~few ms).

Cost:

  • DAX nodes priced like EC2 instances ($/hr).
  • Per-node sizing matters: cache size = node RAM × node count × replication factor.
  • Smaller cluster (3 × r5.large) ≈ $400/month; large cluster (10 × r5.4xlarge) ≈ $4 000/month.

When DAX is the right answer:

  • Read-heavy DynamoDB workload (>80% reads).
  • Hot keys generating GetItem traffic.
  • Microsecond read latency required.
  • Workload can tolerate eventual consistency.

When NOT:

  • Write-heavy workload — DAX doesn't help; writes always go to DynamoDB.
  • Strong-consistency reads required — DAX bypasses cache for those.
  • Small workload — ElastiCache or direct DynamoDB is cheaper.
  • Workload uses Transaction APIs heavily — TransactGetItems / TransactWriteItems bypass DAX.

Compared to DynamoDB built-in caching:

  • DynamoDB has no built-in cache at the API layer (the storage engine has its own internal cache, but it's not exposed).
  • DAX is the supported caching solution. Custom ElastiCache + cache-aside is an alternative for finer control.

Common exam scenarios:

  • 'DynamoDB read latency too high' → DAX.
  • 'Microsecond reads against DynamoDB' → DAX.
  • 'Cache DynamoDB with minimal app changes' → DAX.
  • 'Cache DynamoDB with custom invalidation logic' → ElastiCache + custom code.

ElastiCache: cluster mode, shard count, replica strategy

ElastiCache offers managed Redis and Memcached. Each has very different scaling and HA models. Picking wrong means you can't shard, or you can't replicate, or you're paying for features you don't need.

Redis cluster modes:

Cluster Mode Disabled (single-shard):

  • 1 primary node + 0-5 replicas (read-only).
  • Single shard → all data on one node.
  • Failover: replica auto-promotes (~1 min).
  • Max RAM: limited by largest available node type.
  • Use for: small workloads where total dataset fits on one node + need replicas for read scaling or HA.

Cluster Mode Enabled (sharded):

  • 1-500 shards; each shard is 1 primary + 0-5 replicas.
  • Data partitioned across shards by hash slot (16 384 slots distributed across shards).
  • Total cluster capacity = shard count × per-shard RAM.
  • Failover: per-shard, automatic.
  • Use for: larger datasets, write-scaling workloads, very-high-throughput.

Redis Serverless (newer):

  • Auto-scales ECPUs and storage independently.
  • Multi-AZ HA built-in.
  • No node management.
  • Trade-off: less control over individual node placement.

Memcached:

  • No replication; no persistence.
  • Multi-threaded per node (one node uses all cores; Redis is single-threaded with side processes).
  • Auto-discovery: clients auto-discover all nodes in the cluster.
  • Use for: simple key-value caching where lost data is acceptable.
  • Wrong for: anything needing data structures (Redis lists/sets/sorted sets), persistence, or replication.

Redis vs Memcached (the canonical exam question):

Feature Redis Memcached
Data structures Strings, lists, sets, sorted sets, hashes, streams, geo, hyperloglog Strings only
Persistence Optional (snapshot + AOF) None
Replication Yes (read replicas, multi-AZ) No
Cluster mode (sharding) Yes Manual via client-side hashing
Pub/sub Yes No
Transactions Yes (MULTI/EXEC) No
Multi-threading Single-threaded (per node) Multi-threaded
Snapshots / backups Yes No

Default answer: Redis unless the question explicitly says 'simple multi-threaded caching'.

Caching strategies (applies to both):

  • Lazy loading: read miss → fetch from DB → cache. Cache only holds requested data. Stale data possible (until TTL).
  • Write-through: every write goes to cache AND DB. Always fresh; slower writes; cache may hold rarely-read data.
  • Write-around: writes go to DB only; cache populated on read miss. Reduces cache pollution from write-heavy workloads.
  • TTL-based expiration: time-bounded staleness regardless of strategy.

Common patterns:

  • Session store (Redis): cluster mode disabled with replication for HA; persistence on (RDB + AOF).
  • Leaderboard (Redis): cluster mode enabled with sharding for write scaling; uses sorted sets.
  • Database read cache (Redis or Memcached): cluster mode for scale; lazy-loading; short TTL.
  • Geo-spatial queries (Redis only): GEOADD + GEORADIUS commands.

Reserved Nodes:

  • 1- or 3-year commitment for ~30-55% discount (similar to EC2 RIs).
  • All-Upfront / Partial-Upfront / No-Upfront payment options.
  • Used after running on-demand for ~30 days to verify the right node type.

Database services by access pattern

ServiceData modelLatencyScaleBest for
Aurora (MySQL / PG)RelationalSingle-digit msUp to 256 TiB, 15 read replicasMost new OLTP workloads
RDS (engines)RelationalSingle-digit msUp to 64 TiB primary (SQL Server 16 TiB on gp3; up to 256 TiB total on Oracle/SQL Server with additional volumes); 15 read replicas (async; Oracle recommends ≤5)Oracle / SQL Server / MariaDB engines
DynamoDBKey-value / documentSingle-digit ms (μs with DAX)Effectively unlimitedKnown access patterns, single-key reads / writes
ElastiCache RedisKey-value in-memory + structuresSub-millisecondTB-scale clustersHot data caching, session store, leaderboards
ElastiCache MemcachedSimple key-value in-memorySub-millisecondMulti-node shardedSimple object caching (no persistence, no advanced types)
RedshiftColumnar MPPSeconds-minutesPetabyteData warehouse + BI
OpenSearchInverted index + JSONSingle-digit ms searchTB-scaleSearch + log analytics + observability
NeptuneGraph (Gremlin + SPARQL)Single-digit ms traversalBillions of edgesRecommendation, fraud, social graph
DocumentDBMongoDB-compatible documentSingle-digit msUp to 128 TiB cluster, 15 read replicasMongoDB workloads needing managed service
TimestreamTime-seriesMillisecond writesTrillions of pointsIoT, observability, monitoring data

Sharp facts the exam loves — give these one last read before exam day.

Cheat sheet

Sharp facts the exam loves — scan these before test day.

Match the database to the access pattern, not the data model

Relational (RDS, Aurora) for transactions + joins + ad-hoc queries. Key-value (DynamoDB) for predictable single-item ops at any scale. Document (DocumentDB) for JSON-shaped data. Search (OpenSearch) for full-text. Time-series (Timestream). Graph (Neptune).

Scale reads with replicas; writes with sharding

RDS Read Replicas: async, up to 15 per source. Aurora Replicas: <100 ms typical lag, up to 15. For write scale, DynamoDB partition design + adaptive capacity, or shard across multiple Aurora clusters by tenant/key.

2 questions test this
Aurora is the default RDS-compatible choice

MySQL/PostgreSQL wire-compatible + distributed storage (6 copies across 3 AZs) + auto-scaling storage (up to 256 TiB) + faster failover (<30 s) + Aurora Serverless v2. Pick Aurora unless you specifically need stock RDS Oracle/SQL Server. Aurora storage grows automatically[11] up to 256 TiB on current Aurora engine versions (128 TiB on older versions). No manual resize, no downtime, no provisioning. Charged for what you use — drop a table, storage shrinks.

DAX = microsecond reads in front of DynamoDB

DAX[15] is a write-through, write-around, and read-through cache for DynamoDB. Reads through DAX = microseconds (vs single-digit ms direct). Writes go through DAX and to DynamoDB. Eventual consistency by default. Sits in your VPC; uses DynamoDB API.

8 questions test this
Aurora Read Replicas: < 100 ms typical lag (often < 10 ms)

Shared storage layer[11] (not log shipping) means much lower replica lag than RDS async replicas. 15 read replicas max. Reader endpoint load-balances across replicas. Failover promotes a replica → typically ~1 minute.

4 questions test this
DynamoDB hot partitions: BASE key matters most

Adaptive capacity[16] helps, but a low-cardinality partition key still hurts. Use high-cardinality keys (UUIDs, hashes) or composite keys (tenant#item). GSI helps query different attributes but doesn't fix hot-key writes on the base table.

ElastiCache Redis vs Memcached: pick by feature set

Redis[17]: data structures (lists, sets, sorted sets, streams, geo, hyperloglog), pub/sub, persistence, replication, cluster mode for sharding, transactions. Memcached: simple key-value, no persistence, multi-threaded per node, auto-discovery. Use Redis unless you specifically need simple multi-threaded caching.

13 questions test this
DynamoDB Streams: every change item, 24h retention

Capture every insert/update/delete[18] as a stream record (4 view types: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE, NEW_AND_OLD_IMAGES). 24h retention. Common consumer: Lambda for change-data-capture (CDC) → other services. Enable for Global Tables under the hood.

Redshift: KEY distribution co-locates large join partners; ALL replicates small dimension tables

For large fact-to-large-dimension joins, set DISTSTYLE KEY on the same join column in both tables so matching rows land on the same slice, eliminating redistribution. For small, slowly changing dimension tables (typically under a few million rows), use DISTSTYLE ALL to place a full copy on every node, making any join column work without data movement. EVEN distribution is the default but rarely optimal once join patterns are known.

5 questions test this
Aurora reader endpoint automatically includes Auto Scaling-created replicas

Applications must connect to the Aurora reader endpoint (not individual instance endpoints) to benefit from Aurora Auto Scaling. The reader endpoint uses DNS round-robin and automatically adds newly provisioned replicas once they pass health checks, distributing connections across all available replicas. Using instance-specific endpoints causes new Auto Scaling replicas to receive no traffic.

11 questions test this
Aurora Backtrack: rewind to a previous time in seconds

Aurora MySQL supports rewinding the cluster[12] up to 72 hours back without a restore (no downtime; cluster reverts in-place). For "oops, dropped a table" recovery. Different from PITR (which restores to a new cluster).

Also tested in

References

  1. Amazon RDS User Guide
  2. What is Amazon Aurora? (chapter index)
  3. Amazon DynamoDB overview
  4. What is Amazon ElastiCache
  5. Amazon Redshift overview
  6. Amazon OpenSearch Service
  7. Amazon Neptune
  8. Amazon DocumentDB
  9. Amazon Timestream
  10. Amazon Aurora overview
  11. Aurora storage and reliability (6 copies, 3 AZs)
  12. Aurora Backtrack (Aurora MySQL)
  13. Aurora Serverless v2
  14. Aurora Global Database
  15. DynamoDB Accelerator (DAX)
  16. DynamoDB partition keys + adaptive capacity
  17. ElastiCache Redis vs Memcached
  18. DynamoDB Streams