Study Guide · SAA-C03

SAA-C03 Cheat Sheet

212 entries · 14 chapters · 4 domains

Design Secure Architectures

Secure Access to AWS Resources

Read full chapter

Cheat sheet

Sharp facts the exam loves — scan these before test day.

Workload identity: roles, never static keys

EC2 → instance profile, Lambda → execution role, ECS → task role, EKS → IRSA, GitHub Actions → OIDC AssumeRoleWithWebIdentity. Static access keys are last resort only (legacy SaaS without OIDC). Roles deliver short-lived auto-rotating credentials; keys are permanent secrets that leak once and stay compromised.

4 questions test this
Workforce identity: federate via IAM Identity Center

For 5+ humans, centralize in your IdP (AD, Okta, Entra, Google) and bind via Identity Center permission sets to (group × account × permission set). Local IAM users are a 2-3-person startup pattern at most; Cognito User Pools are for end-customers, not workforce. AWS rebranded AWS SSO to IAM Identity Center[7] in 2022. On the exam, treat 'AWS SSO' and 'IAM Identity Center' as synonymous — same service. Identity Center replaces federated IAM for workforce access in any new design.

Permission boundaries cap, they don't grant

A permission boundary is the ceiling on what an IAM principal can do — never a grant. Effective permissions = identity Allow ∩ boundary Allow ∩ no Deny ∩ no SCP Deny. Use boundaries to delegate role creation to developers without iam:* escalation.

6 questions test this
Cross-account = trust policy + AssumeRole

Never share access keys across accounts. Account B creates a role whose trust policy permits principals in account A; A calls sts:AssumeRole. For third-party SaaS add sts:ExternalId to defeat confused-deputy; for human-assumed roles add aws:MultiFactorAuthPresent.

10 questions test this
Cross-account confused-deputy needs sts:ExternalId

When a third-party SaaS assumes a role in your account, ONLY pinning the vendor's principal ARN in the trust policy is unsafe — a different customer of the same vendor can be tricked into using your role ARN. Pin sts:ExternalId[12] in the trust policy's Condition. The vendor supplies a unique ID per customer; your account validates it on every AssumeRole.

5 questions test this
SCPs are guardrails, not grants

Service Control Policies[10] attached at the AWS Organizations root or OU level can only RESTRICT what member accounts can do. They never grant permission. Common trap: 'How do I let account B access a bucket in account A?' — SCP is wrong; the answer is a bucket policy in A + a role in A assumed by a principal in B.

1 question tests this
Service-linked roles are AWS-managed, hands-off

When you enable Auto Scaling, Organizations, ECS, Lambda@Edge, etc., AWS auto-creates a service-linked role[11] in your account. You can't author or scope it. Exam tip: if the question asks 'how do I grant Service X permission to act on my behalf', the answer is usually 'service-linked role already exists' — not a new role you create.

Enforce IMDSv2 on every EC2 instance

Instance Metadata Service v1 is SSRF-vulnerable — a flaw in a webapp can trick the server into requesting credentials from 169.254.169.254 and leaking them. IMDSv2[2] requires a session token (PUT) before any GET, breaking SSRF chains. Enforce via launch template HttpTokens=required, or org-wide via SCP denying instance launches without it.

Policy evaluation: explicit Deny always wins

Across all four layers (SCP, permission boundary, identity policy, resource policy), an explicit Deny anywhere short-circuits the entire evaluation[13]. No Allow can override it. Common debug pattern: 'Admin user can't access bucket' → check resource policy for an explicit Deny on their principal.

1 question tests this
STS session duration: 1h default, 12h max via MaxSessionDuration

Roles default to a 1-hour session; raise via the role's MaxSessionDuration[14] (1-12 hours). IAM Identity Center permission sets have their own session duration (1-12h). Federation sessions are capped by BOTH the IdP token life AND the role's MaxSessionDuration — the shorter wins.

AWS Organizations: OrganizationAccountAccessRole is the auto-trust role

When a new account joins (or is created in) AWS Organizations[15], an OrganizationAccountAccessRole is auto-provisioned with full admin in the new account, trusted by the management account. Pattern: central security account assumes this from management account → no per-account setup.

Access Analyzer surfaces unintended cross-account / public exposure

IAM Access Analyzer[16] scans resource policies (S3 buckets, IAM roles, KMS keys, Lambda functions, SQS queues, Secrets Manager) and flags grants to principals outside your account / org. Free to enable per region; finds the "this bucket policy accidentally allows the internet" problem.

aws:PrincipalOrgID restricts access to AWS Organization members

The aws:PrincipalOrgID global condition key in bucket policies (or other resource policies) ensures that only principals whose accounts are members of the specified AWS Organization can access the resource. It automatically includes any new accounts added to the organization without requiring policy updates. This is the recommended alternative to listing every account ID in the Principal element, and it scales automatically as the organization grows.

5 questions test this
Cross-account S3 access requires both a bucket policy and an IAM policy

When the requesting identity and the S3 bucket are in different AWS accounts, access requires permissions in both accounts: a bucket policy in the resource account granting the specific action to the cross-account principal, and an IAM identity-based policy in the requesting account granting access to the bucket ARN. Neither policy alone is sufficient for cross-account access.

9 questions test this

Secure Workloads and Applications

Read full chapter
  • Defense in depth — never a single fence
  • VPC endpoints + PrivateLink keep AWS traffic off the internet
  • Security Groups stateful, NACLs stateless
  • Secrets Manager + rotation beats static credentials
  • WAF only attaches to CloudFront, ALB, API Gateway, AppSync
  • Shield Advanced: $3 000/month per payer account
  • Secrets Manager rotation: configurable schedule (commonly 30 / 60 / 90 days)
  • GuardDuty + Security Hub for org-wide detection
  • Parameter Store SecureString vs Secrets Manager
  • GuardDuty findings flow → EventBridge → Lambda for auto-response
  • Inspector scans EC2 + ECR + Lambda automatically
  • Network Firewall: stateful + managed rule groups for VPC-level inspection
  • Alternating-users rotation strategy eliminates credential downtime
  • AWS Managed Rules provide OWASP Top 10 coverage with zero maintenance
  • WAF rate-based rules automatically block IPs exceeding a request threshold
  • Reference a security group ID as source to avoid IP management
  • WAF evaluates rules lowest-priority-number first; Allow and Block are terminating
  • WAF scope-down statements exclude specific traffic from a managed rule group
  • Custom NACLs deny all traffic by default; rule order determines precedence
  • Use aws:SourceVpce or aws:SourceVpc to restrict S3 access to a specific VPC endpoint or VPC

Unlock with Premium — includes all practice exams and the complete study guide.

Data Security Controls

Read full chapter

Cheat sheet

Sharp facts the exam loves — scan these before test day.

Encrypt by default — at rest AND in transit

Compliance (HIPAA, PCI, GDPR) requires both: SSE or KMS at rest, TLS in transit. S3 default encryption is SSE-S3 since Jan 2023; enable bucket policy that requires aws:SecureTransport=true to enforce TLS for clients.

2 questions test this
KMS four-tier hierarchy: AWS-owned → AWS-managed → customer-managed → CloudHSM

AWS-owned: free, no visibility. AWS-managed: free, audit trail. Customer-managed: ~$1/key/month + per-request, full control + cross-account share + rotation. CloudHSM: dedicated FIPS 140-2 Level 3, highest cost.

4 questions test this
Bucket-level controls before object-level

Block Public Access at account + bucket level kills ACL footguns. Default encryption applies to all objects. Object-level ACLs and per-object policies should be the exception, not the pattern — they're hard to audit at scale.

3 questions test this
Macie discovers + classifies S3 PII / PHI

Macie scans S3 buckets for PII, PHI, financial data, and credentials using managed identifiers + custom regex. Findings route via EventBridge for auto-remediation. Cost is ~$1/GB scanned — scope by bucket selectively, not org-wide blind.

10 questions test this
S3 default encryption is SSE-S3, on by default since Jan 2023

Every new S3 bucket gets SSE-S3 enabled automatically[1]. You can upgrade to SSE-KMS at any time. Pre-2023 buckets may still be unencrypted — audit with AWS Config rule s3-bucket-server-side-encryption-enabled.

KMS automatic rotation: yearly (CMKs only)

Customer-managed KMS keys support automatic yearly rotation[18] — opt-in via EnableKeyRotation API. AWS-managed keys rotate every year automatically with no opt-in. Imported key material doesn't auto-rotate; you must re-import.

2 questions test this
KMS deletion has a 7–30 day pending window

ScheduleKeyDeletion[19] requires a 7–30 day waiting period — there's no instant delete. Within that window, you can CancelKeyDeletion. Use for accidental-deletion protection. CloudHSM, by contrast, allows immediate key destruction.

Macie cost: ~$1/GB scanned + per-object analysis

Initial discovery scan can be expensive on petabyte-scale buckets. Common pattern: full scan once after migration, then scheduled monthly scans on critical buckets only. Use excludes[15] filters to skip large non-sensitive prefixes.

2 questions test this
KMS grants vs key policies — use grants for ephemeral access

Key policies are static — edit them and you change permanent permissions. Grants[20] are temporary, programmatic permissions issued via kms:CreateGrant — perfect for short-lived workloads that need temporary access to a key. AWS services (e.g. RDS encryption) create grants automatically.

4 questions test this
KMS ViaService condition restricts a key to one service

Add Condition: { StringEquals: { kms:ViaService: "s3.us-east-1.amazonaws.com" } }[21] to a key policy → that key can only be used via S3 in us-east-1. Defense against credentials being misused to decrypt the key elsewhere.

6 questions test this
S3 Object Lock: write-once compliance retention

Two modes: Governance (admin can override) vs Compliance (no one, not even root, can shorten retention or delete during the window). Required for SEC 17a-4(f) / FINRA / CFTC compliance[22]. Set per-object or via default retention policy on the bucket. Versioning must be on.

ACM certificate for CloudFront must be in us-east-1

AWS Certificate Manager certificates used with Amazon CloudFront must be requested or imported in the US East (N. Virginia) Region (us-east-1), regardless of where the origin or end users are located. CloudFront is a global service, and certificates in us-east-1 are automatically distributed to all edge locations configured for the distribution. A certificate in any other region will not appear in the CloudFront console.

11 questions test this
S3 Object Ownership — Bucket owner enforced disables ACLs

Setting S3 Object Ownership to Bucket owner enforced disables all ACLs on the bucket and makes the bucket owner the automatic owner of every object, including objects uploaded by other AWS accounts. Access is then controlled exclusively through bucket policies and IAM policies. This is the AWS-recommended default for new buckets and resolves cross-account upload scenarios where the uploader would otherwise retain object ownership.

11 questions test this
S3 Block Public Access at org level overrides account and bucket settings

S3 Block Public Access can be enforced at the AWS Organizations level by attaching the policy at the root or OU. This setting propagates to all member accounts, including newly joined accounts, and overrides account-level and bucket-level Block Public Access settings. Individual account administrators cannot remove it. To allow a specific account to host a public bucket, the org administrator must exclude that account from the policy.

6 questions test this
S3 Bucket Keys reduce SSE-KMS request costs by up to 99%

When S3 Bucket Keys are enabled on a bucket configured with SSE-KMS, AWS KMS generates a bucket-level key that S3 uses to create data keys for individual objects, dramatically reducing the number of direct calls to AWS KMS. This can cut KMS request costs by up to 99% on high-traffic buckets while keeping all objects encrypted with the customer managed key.

6 questions test this
Cross-account KMS access requires both a key policy and an IAM policy

Granting cross-account access to a customer managed KMS key requires two configurations: the key policy in the key-owning account must grant permissions to the external account (or specific principal), and an IAM policy in the external account must explicitly allow the principals to use that specific key ARN. The key policy determines who can have access; the IAM policy determines who does have access. Neither alone is sufficient.

8 questions test this
S3 Batch Operations re-encrypts existing objects in place

Changing a bucket's default encryption configuration only affects newly uploaded objects; existing objects retain their original encryption. To re-encrypt billions of existing objects with a new SSE-KMS customer managed key, use S3 Batch Operations with the Copy operation, which copies objects back to the same bucket while applying the new encryption settings.

4 questions test this
KMS keys are Region-specific — cross-Region replicas need a key in the target Region

AWS KMS keys are Regional resources and cannot be used across AWS Regions. When creating an encrypted cross-Region RDS read replica, you must specify a customer managed key (or AWS managed key) that exists in the destination Region. Similarly, an S3 bucket can only use a KMS key from the same Region for SSE-KMS encryption.

5 questions test this
Macie delegated admin manages org-wide discovery without using the management account

In an AWS Organizations environment, the management account designates a dedicated security account as the Macie delegated administrator. The delegated administrator can enable Macie in member accounts, run sensitive data discovery jobs across the organization, and aggregate findings centrally. AWS best practice is to use a separate security account rather than the management account for day-to-day Macie operations. Automated sensitive data discovery uses sampling to assign sensitivity scores to every S3 bucket, providing broad visibility cost-efficiently.

6 questions test this
CloudFront SNI-only SSL is cost-free and supports all modern browsers

Server Name Indication (SNI) allows CloudFront to serve HTTPS requests with custom SSL certificates without requiring a dedicated IP address per certificate, incurring no additional monthly charge. All modern browsers released after 2010 support SNI. The alternative, Dedicated IP SSL, incurs an additional monthly fee per distribution and is only needed for legacy clients that do not support SNI.

6 questions test this
Set Origin Protocol Policy to HTTPS Only and minimum TLS 1.2 for encrypted CloudFront-to-origin traffic

To enforce HTTPS between CloudFront and a custom origin, set the Origin Protocol Policy to HTTPS Only and configure the minimum origin SSL protocol to TLSv1.2. CloudFront returns HTTP 502 (Bad Gateway) if the origin presents a self-signed certificate or an untrusted certificate chain when HTTPS Only is active — the origin certificate must be signed by a trusted CA.

3 questions test this
SSE-KMS upload requires kms:GenerateDataKey; download requires kms:Decrypt

When uploading objects to an S3 bucket configured with SSE-KMS, Amazon S3 calls AWS KMS to generate a data key, requiring kms:GenerateDataKey on the key. When downloading, S3 must decrypt the data key, requiring kms:Decrypt. Both permissions are needed for an application that both uploads and downloads. An IAM policy missing either permission will cause the corresponding S3 operation to fail with an AccessDenied error.

6 questions test this
Cross-account Secrets Manager access requires a resource policy on the secret AND KMS key policy

Accessing a Secrets Manager secret from a different AWS account requires both a resource-based policy on the secret (granting secretsmanager:GetSecretValue) and a key policy on the encrypting KMS key (granting kms:Decrypt). The AWS managed key aws/secretsmanager cannot be used for cross-account access because its key policy cannot be modified; a customer managed key is required.

8 questions test this
Use kms:EncryptionContext:SecretARN to scope Lambda rotation function's KMS decrypt to one secret

When a Secrets Manager secret is encrypted with a customer managed KMS key, the Lambda rotation function's execution role needs kms:Decrypt on that key. Adding a condition using the kms:EncryptionContext:SecretARN key restricts the function to decrypt only the specific secret it is authorized to rotate, following least-privilege. Without this condition, a single KMS permission would allow decryption of any secret encrypted with the same key.

5 questions test this
VPC endpoints can use endpoint policies to scope allowed actions

An Interface or Gateway endpoint can carry its own endpoint policy[23] restricting which API actions and resources are allowed through it. Common pattern: S3 Gateway endpoint allows s3:GetObject on your specific buckets only — anything else through the endpoint is denied.

Design Resilient Architectures

Scalable and Loosely Coupled Architectures

Read full chapter
  • Pick integration primitive by consumer model
  • Always pair queues with DLQs + retry policies
  • SQS visibility timeout ≥ 6× Lambda timeout
  • SQS FIFO throughput: 300 / 3 000 / high-throughput mode
  • SNS doesn't retain — SQS does
  • EventBridge archive + replay
  • Step Functions Standard vs Express
  • EventBridge Pipes: SQS / Kinesis / DDB stream → target (no Lambda glue)
  • SQS long polling reduces empty receives + cost
  • API Gateway throttling: account, stage, method levels
  • Target tracking with ALBRequestCountPerTarget scales by request rate
  • Step Functions Distributed Map for large-scale S3 parallel processing
  • Step Functions .waitForTaskToken pauses workflow for external callbacks
  • Step Functions Retry with exponential backoff + Catch for fallback
  • Parallel state Catch intercepts any branch failure
  • SQS + Lambda partial batch response via ReportBatchItemFailures
  • SQS DLQ retention period must exceed source queue retention
  • EventBridge global endpoints for automatic multi-region failover
  • EventBridge DLQ per rule target captures undeliverable events

Unlock with Premium — includes all practice exams and the complete study guide.

Highly Available and Fault-Tolerant Architectures

Read full chapter

Cheat sheet

Sharp facts the exam loves — scan these before test day.

Multi-AZ for in-region HA; Multi-Region for DR

Spread stateless tiers across ≥2 AZs behind a load balancer. RDS Multi-AZ: synchronous standby, 60-120 s failover. Multi-Region only when single-region failure is in your threat model — adds latency, cost, replication complexity.

3 questions test this
Pick DR strategy by RTO and RPO

Four canonical strategies in increasing cost / decreasing RTO/RPO: backup-and-restore (hours), pilot-light (minutes-hours), warm-standby (minutes), multi-site active-active (near-zero). The exam picks the cheapest one that meets the stated RTO/RPO.

ASG + ELB + health checks is the default stateless tier

Auto Scaling Group across ≥2 AZs + ALB/NLB with health checks at the target group. Unhealthy instances drain + replace automatically. ASG min/max/desired control capacity; target-tracking or step scaling policies handle demand changes.

10 questions test this
Route 53 has 7 routing policies, each for a specific intent

Simple (one answer), Weighted (split traffic), Latency (route to closest region), Failover (active/passive via health check), Geolocation (by user country), Geoproximity (by lat/lon + bias), Multi-value (DNS-level load balancing).

3 questions test this
RDS Multi-AZ failover: 60-120 seconds typical

Automatic failover[1] updates the DNS endpoint to the standby; clients with cached DNS will see ~60-120 s of errors. App needs to reconnect on connection failure. Standby is NOT readable — for read scaling, use Read Replicas[13] (separate feature).

2 questions test this
Aurora Global Database: typically sub-second cross-region RPO

Replicates Aurora across regions[2] via dedicated network. RPO typically <1 s; failover RTO ~1 min (managed promotion). Secondary regions support read-only and can be promoted to writer. Big advantage over manual cross-region replicas.

1 question tests this
Route 53 health checks: 30 s default interval, 3 failures = unhealthy

Default check interval 30 s[14], healthy threshold 3, unhealthy threshold 3. Fast failover requires faster checks (10 s interval supported, paid). Use 'calculated' health checks to combine multiple endpoint checks for AND/OR logic.

5 questions test this
ELB health check type EC2 vs ELB

ASG HealthCheckType=EC2[15] only replaces instances that the EC2 instance itself reports as unhealthy (hardware fail). HealthCheckType=ELB replaces instances that the load balancer's health check fails — catches app-layer failures too. Use ELB for production.

9 questions test this
S3 Cross-Region Replication: async, prefix/tag-filtered

CRR replicates new objects[4] from source to destination bucket asynchronously (typically seconds). Existing objects need a one-time batch operation. Can filter by prefix or tag. Versioning must be on for both buckets.

8 questions test this
AWS Backup centralizes backups across services + accounts

Backup plans + selections[8] cover RDS, DynamoDB, EFS, EBS, FSx, Storage Gateway, etc. Cross-region, cross-account copy supported. Audit Manager + Backup Vault Lock for compliance scenarios.

Route 53 failover record requires a health check on PRIMARY

Active/passive failover routing[16] needs the primary record to have an associated health check. If the check fails, the secondary record is served. Without the health check, Route 53 always serves the primary.

8 questions test this
ASG health check grace period protects initializing instances

The health check grace period tells Auto Scaling how long to wait before evaluating the health of a newly launched instance after it enters InService state. Set it to at least as long as the application startup time; otherwise ELB health check failures during initialization cause continuous termination and replacement loops.

15 questions test this
ALB deregistration delay (connection draining) for long requests

When a target is removed from an ALB target group, the load balancer stops sending new requests but waits for the deregistration delay (default 300 s, range 0-3600 s) before completing deregistration. Set this value to at least the maximum expected request processing time to prevent HTTP 5xx errors during scale-in events.

8 questions test this
ALB slow start mode ramps traffic to new targets

Slow start mode causes the ALB to linearly increase the share of requests sent to a newly registered target over a configurable duration of 30–900 seconds. Use it when instances need a warm-up period (e.g., JIT cache warming, dataset loading) before they can handle their full share of traffic.

7 questions test this
ALB cross-zone load balancing always on at LB level, configurable per target group

For Application Load Balancers, cross-zone load balancing is always enabled at the load balancer level and cannot be turned off. However, it can be explicitly disabled at the target group level, overriding the load balancer default. When enabled, each LB node distributes traffic evenly across all registered targets in all enabled Availability Zones.

4 questions test this
NLB provides static IP per AZ; assign Elastic IPs for fixed addresses

Network Load Balancers automatically provide one static IP address per enabled Availability Zone. For internet-facing NLBs you can also assign your own Elastic IP per AZ, giving external clients fixed addresses to allowlist in firewalls. NLB operates at Layer 4, supports ultra-low latency, and preserves the client source IP address by default.

7 questions test this
Route 53 latency routing + Evaluate Target Health = active-active multi-region failover

Latency-based routing records with Evaluate Target Health set to Yes implement active-active failover: all healthy regions serve traffic based on lowest latency, and Route 53 automatically stops routing to a region when its resources become unhealthy. For hierarchical configurations (latency over weighted), ETH on the top-level alias causes Route 53 to traverse the tree and consider the region unhealthy only when all underlying weighted records fail.

12 questions test this
Route 53 calculated health checks aggregate child health check results

A calculated health check monitors other health checks (child health checks) and reports healthy when the number of healthy children meets a configurable threshold. This lets you trigger DNS failover only when a minimum number of endpoints are down (e.g., healthy if at least 2 of 6 servers are up), rather than reacting to individual endpoint failures.

8 questions test this
Route 53 weighted records + health checks implement active-active failover

Any routing policy other than Failover combined with health checks creates an active-active configuration. With weighted records, Route 53 distributes traffic according to weights while all records are healthy; when a record's health check fails, Route 53 excludes it from responses and redistributes remaining traffic to healthy records. A zero-weight record acts as a standby, receiving traffic only when all nonzero-weight records are unhealthy.

7 questions test this
Route 53 hierarchical routing: latency alias over per-region weighted records

A common multi-tier DNS pattern uses latency alias records at the top level (for region selection) pointing to weighted records within each region (for intra-region distribution). Enabling Evaluate Target Health on the latency alias causes Route 53 to consider a region healthy only if at least one of its weighted child records is healthy, enabling cascading health propagation.

4 questions test this
Aurora replica failover priority tiers 0 (highest) to 15 (lowest)

Each Aurora Replica can be assigned a promotion priority tier from 0 (promoted first) to 15 (promoted last). When the primary instance fails, Aurora promotes the replica with the lowest tier number. Assign tier 0 to the preferred standby (e.g., same instance class as the primary) and higher tiers to replicas used for analytics or reporting.

4 questions test this
RDS Snapshot copies cross-region + cross-account for DR

RDS automatic snapshots are tied to the source region. Manual or copied snapshots[17] can move cross-region (encrypted with a regional KMS key) or cross-account (share with target account). Used in pilot light / warm standby DR strategies.

Design High-Performing Architectures

High-Performing and Scalable Storage

Read full chapter
  • Three storage shapes: object, file, block — pick by access pattern
  • EBS volume type by IOPS + throughput
  • EFS performance + throughput modes are independent
  • gp3 is the new default — migrate gp2 to save ~20%
  • io2 Block Express: up to 256 000 IOPS, sub-millisecond p99
  • Instance store is FREE — but dies on stop/terminate/hibernate
  • FSx for Lustre: HPC scratch at TB/s-class aggregate throughput
  • EFS Bursting Throughput credits accumulate while idle
  • EBS Multi-Attach: only io1/io2; clustered-only filesystems
  • FSx for Windows / NetApp: Active Directory integrated SMB
  • S3 Intelligent-Tiering auto-moves objects across tiers — no retrieval fee
  • S3 Glacier Instant Retrieval: archive cost, millisecond access
  • S3 Transfer Acceleration requires period-free bucket names

Unlock with Premium — includes all practice exams and the complete study guide.

High-Performing and Elastic Compute

Read full chapter
  • Match the instance family to the workload bottleneck
  • Compute spectrum: EC2 → ECS/EKS → Fargate → Lambda
  • Lambda max: 15 min, 10 GB memory, 10 GB ephemeral storage
  • Lambda provisioned concurrency eliminates cold starts (at a cost)
  • Placement groups: Cluster, Spread, Partition — each for a different goal
  • EFA (Elastic Fabric Adapter) bypasses the kernel — only on HPC + ML
  • Burstable T-family CPU credits — accumulated + spent per second
  • Lambda SnapStart cuts cold starts ~10× for supported runtimes
  • Spot interruption notice = 2 minutes; Rebalance recommendation earlier
  • ECS Fargate capacity provider: base guarantees minimum; weights split the rest
  • ECS Service Auto Scaling target tracking: asymmetric cooldowns for fast scale-out, slow scale-in
  • EKS Cluster Autoscaler priority expander controls which node group scales first
  • ASG lifecycle hooks pause instance launch / terminate for setup / cleanup

Unlock with Premium — includes all practice exams and the complete study guide.

High-Performing Databases

Read full chapter

Cheat sheet

Sharp facts the exam loves — scan these before test day.

Match the database to the access pattern, not the data model

Relational (RDS, Aurora) for transactions + joins + ad-hoc queries. Key-value (DynamoDB) for predictable single-item ops at any scale. Document (DocumentDB) for JSON-shaped data. Search (OpenSearch) for full-text. Time-series (Timestream). Graph (Neptune).

Scale reads with replicas; writes with sharding

RDS Read Replicas: async, up to 15 per source. Aurora Replicas: <100 ms typical lag, up to 15. For write scale, DynamoDB partition design + adaptive capacity, or shard across multiple Aurora clusters by tenant/key.

2 questions test this
Aurora is the default RDS-compatible choice

MySQL/PostgreSQL wire-compatible + distributed storage (6 copies across 3 AZs) + auto-scaling storage (up to 256 TiB) + faster failover (<30 s) + Aurora Serverless v2. Pick Aurora unless you specifically need stock RDS Oracle/SQL Server. Aurora storage grows automatically[11] up to 256 TiB on current Aurora engine versions (128 TiB on older versions). No manual resize, no downtime, no provisioning. Charged for what you use — drop a table, storage shrinks.

DAX = microsecond reads in front of DynamoDB

DAX[15] is a write-through, write-around, and read-through cache for DynamoDB. Reads through DAX = microseconds (vs single-digit ms direct). Writes go through DAX and to DynamoDB. Eventual consistency by default. Sits in your VPC; uses DynamoDB API.

8 questions test this
Aurora Read Replicas: < 100 ms typical lag (often < 10 ms)

Shared storage layer[11] (not log shipping) means much lower replica lag than RDS async replicas. 15 read replicas max. Reader endpoint load-balances across replicas. Failover promotes a replica → typically ~1 minute.

4 questions test this
DynamoDB hot partitions: BASE key matters most

Adaptive capacity[16] helps, but a low-cardinality partition key still hurts. Use high-cardinality keys (UUIDs, hashes) or composite keys (tenant#item). GSI helps query different attributes but doesn't fix hot-key writes on the base table.

ElastiCache Redis vs Memcached: pick by feature set

Redis[17]: data structures (lists, sets, sorted sets, streams, geo, hyperloglog), pub/sub, persistence, replication, cluster mode for sharding, transactions. Memcached: simple key-value, no persistence, multi-threaded per node, auto-discovery. Use Redis unless you specifically need simple multi-threaded caching.

13 questions test this
DynamoDB Streams: every change item, 24h retention

Capture every insert/update/delete[18] as a stream record (4 view types: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE, NEW_AND_OLD_IMAGES). 24h retention. Common consumer: Lambda for change-data-capture (CDC) → other services. Enable for Global Tables under the hood.

Redshift: KEY distribution co-locates large join partners; ALL replicates small dimension tables

For large fact-to-large-dimension joins, set DISTSTYLE KEY on the same join column in both tables so matching rows land on the same slice, eliminating redistribution. For small, slowly changing dimension tables (typically under a few million rows), use DISTSTYLE ALL to place a full copy on every node, making any join column work without data movement. EVEN distribution is the default but rarely optimal once join patterns are known.

5 questions test this
Aurora reader endpoint automatically includes Auto Scaling-created replicas

Applications must connect to the Aurora reader endpoint (not individual instance endpoints) to benefit from Aurora Auto Scaling. The reader endpoint uses DNS round-robin and automatically adds newly provisioned replicas once they pass health checks, distributing connections across all available replicas. Using instance-specific endpoints causes new Auto Scaling replicas to receive no traffic.

11 questions test this
Aurora Backtrack: rewind to a previous time in seconds

Aurora MySQL supports rewinding the cluster[12] up to 72 hours back without a restore (no downtime; cluster reverts in-place). For "oops, dropped a table" recovery. Different from PITR (which restores to a new cluster).

High-Performing and Scalable Networks

Read full chapter
  • CloudFront for HTTP global; Global Accelerator for non-HTTP / static anycast
  • Many-VPC: Transit Gateway > peering > Direct Connect Gateway
  • Load balancer choice: L4 vs L7 vs static IP
  • CloudFront origins: not just S3 + EC2
  • Transit Gateway: per-hour-per-attachment + per-GB processed
  • ALB target groups can be IPs, Lambdas, or instances
  • Route 53 latency-based routing measures from EDGE locations
  • CloudFront Functions vs Lambda@Edge — pick by complexity
  • Global Accelerator: traffic dials + endpoint weights
  • API Gateway: REST vs HTTP vs WebSocket
  • CloudFront Origin Shield adds a centralized cache layer to consolidate origin requests
  • Route 53 multivalue answer returns up to eight healthy IPs from associated health checks
  • Route 53 Evaluate Target Health propagates ELB health into DNS failover
  • S3 Transfer Acceleration is for UPLOADS, not cost

Unlock with Premium — includes all practice exams and the complete study guide.

High-Performing Data Ingestion and Transformation

Read full chapter
  • Streaming vs batch — pick by latency requirement
  • Pick transformation tool by data shape + size
  • Kinesis Data Firehose buffers at time OR size threshold
  • Kinesis Data Streams: 1 shard = 1 MB/s in, 2 MB/s out
  • Glue Data Catalog is THE metastore for the lake
  • Lake Formation = fine-grained access control over the lake
  • Glue worker types: G.1X / G.2X / G.4X / G.8X / G.025X
  • DMS: replicate from source to target with optional CDC
  • Snowball Edge for offline bulk transfer; Snowmobile retired
  • IoT Core error action fires when the primary rule action fails — preserves messages
  • IoT Core Basic Ingest bypasses the message broker — no per-message cost
  • Firehose Lambda transform: return recordId + result status + base64 data

Unlock with Premium — includes all practice exams and the complete study guide.

Design Cost-Optimized Architectures

Cost-Optimized Compute

Read full chapter

Cheat sheet

Sharp facts the exam loves — scan these before test day.

Capacity commitment for predictable usage

Steady-state EC2/Fargate/Lambda running >70% of any 1- or 3-year window fits a capacity commitment. Reserved Instances are workload-specific; Savings Plans are usage-based across families. RIs apply BEFORE Savings Plans — buy order matters.

15 questions test this
Spot for interruption-tolerant work

Spot Instances use AWS spare capacity at up to 90% off On-Demand. 2-minute interruption notice; allocation strategy price-capacity-optimized is the modern default. Use for stateless workers, batch, big-data, CI fleets. Never for stateful workloads without checkpointing.

1 question tests this
Right-sizing for everyone (Compute Optimizer)

Compute Optimizer analyzes utilization and recommends smaller instances or different families for EC2, EBS, Lambda, ECS-on-Fargate. Lambda recommendations need ≥50 invocations in 14 days. Trusted Advisor surfaces RI / SP recommendations separately.

5 questions test this
RIs apply BEFORE Savings Plans — buying order matters

Each billing hour AWS applies discounts in a fixed order: RIs first (to matching family/region/AZ/OS usage), then Savings Plans (to any remaining eligible usage, highest-discount-first), then on-demand. If you already own RIs covering your baseline, a fresh Compute SP overlapping that usage sits idle — the RIs consume the hours first. Buy SPs to cover un-RI-covered usage, then add RIs only for instance types with reliably steady utilization.

5 questions test this
Exchangeability ladder: Standard RI < Convertible RI < Compute SP

Standard RI: family-locked, can sell on RI Marketplace if no longer needed. Convertible RI: can exchange for a different family/OS/tenancy without selling. Compute SP: cover EC2 + Fargate + Lambda across any family and any region; the most flexible commitment but slightly lower max discount.

11 questions test this
Spot allocation: price-capacity-optimized is the modern default

lowest-price[10] minimizes hourly cost but maximizes interruption risk (cheapest pools are reclaimed first). capacity-optimized launches from the pool with lowest predicted interruption — best for long-running workloads. price-capacity-optimized (current AWS default for Fleet/ASG) balances both. For HPC/ML with instance-type preferences, capacity-optimized-prioritized respects your priority list.

4 questions test this
Lambda Compute Optimizer needs ≥50 invocations in 14 days

Below 50 invocations / 14 days[9], Lambda functions get NO Compute Optimizer recommendation. Exam pattern: 'Compute Optimizer cannot generate a recommendation' for a low-traffic Lambda → root cause is this threshold, not a configuration issue.

1 question tests this
Graviton is the answer when 'reduce cost' meets x86-agnostic

AWS Graviton (ARM64) instances[11] deliver up to 40% better price-performance vs comparable x86. Most managed services support Graviton (RDS, Aurora, ElastiCache, Lambda, Fargate). When the question says 'reduce cost' and doesn't restrict architecture, Graviton is usually a correct answer.

1 question tests this
Fargate Spot: deep discount vs Fargate on-demand; same 2-min notice

Fargate Spot[12] uses spare ECS Fargate capacity at deep discount. Same 2-minute interruption notice as EC2 Spot. Good for: CI builds, fault-tolerant containerized batch, dev/test. Mixed capacity provider: FARGATE for baseline + FARGATE_SPOT for burst.

5 questions test this
Trusted Advisor surfaces RI / SP recommendations

Free Trusted Advisor[13] checks include "underutilized EC2 instances" (right-sizing), "RI optimization", and "Savings Plans recommendations" — once you have ~30 days of usage. Business / Enterprise support tier unlocks the full check set.

Capacity Rebalancing: proactively replaces at-risk Spot Instances before the 2-minute notice

When Capacity Rebalancing is enabled on an Auto Scaling group, it responds to EC2 instance rebalance recommendation signals — which arrive before the 2-minute interruption notice — by launching a replacement instance proactively. Pair it with lifecycle hooks to allow in-flight requests to drain gracefully before the old instance terminates.

4 questions test this
Compute Optimizer: paid Enhanced Infrastructure Metrics extends lookback to 93 days for cyclical workloads

By default Compute Optimizer analyses 14 days of CloudWatch metrics, which misses monthly or quarterly utilization spikes. The Enhanced Infrastructure Metrics paid add-on extends the lookback period to up to 93 days, enabling accurate recommendations for workloads with cyclical billing or processing patterns.

4 questions test this
Compute Optimizer: org-level preferences, no Spot recommendations, RDS MySQL supported

Recommendation preferences (approved instance families, CPU headroom, lookback period) configured from the management account automatically propagate to all member accounts in an AWS Organization, minimizing per-account overhead. Compute Optimizer does NOT generate rightsizing recommendations for Spot Instances. It does support RDS MySQL and PostgreSQL (with Performance Insights) alongside EC2, Lambda, EBS, and ECS.

4 questions test this
Zonal RIs provide a billing discount AND a capacity reservation in that specific AZ

A Regional Reserved Instance applies a billing discount across all AZs in a Region but does NOT reserve capacity. A Zonal Reserved Instance scoped to a specific Availability Zone provides both the billing discount and a guaranteed capacity reservation matching the instance attributes, ensuring instances can launch even during peak demand in that AZ.

4 questions test this
Graviton (ARM) for 20-40% better price-performance

AWS-designed ARM64 processors deliver up to 40% better price-performance vs x86 equivalents. Requires ARM-compatible runtime (most modern languages OK; some Windows AMIs + legacy binaries don't). When the question says "reduce cost" + workload is portable: Graviton.

Cost-Optimized Storage

Read full chapter
  • Cost = access pattern × volume, not just volume
  • Lifecycle policies > manual tiering; Intelligent-Tiering > both when unknown
  • Watch the minimum-size and minimum-storage traps
  • 128 KB is the IA minimum — millions of tiny files cost MORE in IA
  • 30 / 90 / 180-day minimum storage charges
  • Glacier retrieval tiers: Expedited / Standard / Bulk
  • Intelligent-Tiering: no retrieval fee, small monitoring fee
  • EFS lifecycle: similar pattern to S3, different tier names
  • EBS Snapshot Archive: 75% cheaper, 90-day minimum
  • S3 Storage Lens dashboards quantify storage by class / bucket / prefix
  • EFS throughput modes: Bursting baseline shrinks as data moves to IA; Elastic scales automatically; Archive requires Elastic
  • Storage Gateway modes: Tape replaces tapes → Deep Archive; Volume cached minimizes on-prem; Volume stored keeps data local; S3 File = S3 objects via SMB/NFS; FSx File = FSx for Windows
  • Athena cost = per-TB scanned — partition + columnar matters
  • S3 Lifecycle minimum transition wait: 0 days (Standard → IA: optional 30)

Unlock with Premium — includes all practice exams and the complete study guide.

Cost-Optimized Databases

Read full chapter
  • Predictability picks the pricing model
  • Caching reduces required database tier
  • DynamoDB on-demand vs provisioned: utilization-driven trade-off
  • Aurora Serverless v2 break-even ≈ 50% utilization
  • ElastiCache Reserved Nodes: same payment options as EC2 RIs
  • RDS instances can be stopped for up to 7 days
  • Lazy-loading vs write-through: pick by hot-key tolerance
  • Aurora I/O-Optimized: predictable cost when I/O is heavy
  • ElastiCache Serverless: auto-scales without capacity planning, pay-per-use, Valkey is 33% cheaper
  • ElastiCache data tiering (R6gd): hot data stays in memory, cold data moves to NVMe SSD — 60 %+ savings when ≤20 % is hot
  • Customer-managed KMS key: ~$1/key/month + per-request
  • Aurora Serverless v2: ACU autoscaling in 0.5-ACU increments

Unlock with Premium — includes all practice exams and the complete study guide.

Cost-Optimized Network

Read full chapter
  • Egress dominates — keep traffic on the AWS backbone
  • Cache at the edge and minimize cross-AZ
  • Gateway Endpoints are FREE — Interface Endpoints have a per-hour charge
  • NAT Gateway costs: per-hour + per-GB processed + egress
  • Inter-AZ data transfer: charged BOTH WAYS
  • CloudFront egress is typically cheaper than EC2 egress
  • CloudFront price classes: 100 / 200 / All
  • Direct Connect Data Transfer Out (DTO) is much cheaper than internet

Unlock with Premium — includes all practice exams and the complete study guide.