Highly Available and Fault-Tolerant Architectures
Auto Scaling: target tracking vs step vs scheduled — when each applies
Auto Scaling Groups (ASGs) replace failed instances and scale capacity. Four scaling policy types, each for a different signal pattern.
Target tracking scaling (the default for new ASGs):
- Pick a metric + target value: e.g. 'average CPU = 50%'.
- ASG auto-creates two CloudWatch alarms (scale-out + scale-in) and adjusts capacity to maintain the target.
- Built-in metrics: ASGAverageCPUUtilization, ASGAverageNetworkIn/Out, ALBRequestCountPerTarget.
- Custom metrics: any CloudWatch metric (e.g. queue depth, custom application metric).
- Use for: 95% of scaling needs. Simplest; auto-tunes.
Step scaling:
- Define explicit steps: 'if CPU 70-80, add 1 instance; if CPU 80-90, add 2; if CPU > 90, add 4'.
- Cooldown period prevents over-reacting (e.g. 60 seconds before next scaling action).
- Use for: workloads that need precise control over scaling magnitude relative to load severity.
Simple scaling (legacy):
- One alarm → one scaling action (e.g. add 1 instance).
- Don't use for new designs — superseded by step scaling.
Scheduled scaling:
- Set capacity changes by clock (UTC): 'at 09:00 weekdays scale to 20 instances; at 18:00 scale to 5'.
- Use for: predictable load patterns (business-hours workloads, scheduled batch processing).
- Stack with target tracking: scheduled sets baseline; target tracking handles deviations.
Predictive scaling:
- ML-based forecast of future load using up to 14 days of metric history.
- Pre-emptively scales BEFORE load hits.
- Best for cyclical loads with 24h+ patterns.
Instance refresh:
- Replaces instances in the ASG (rolling) — useful for deploying a new AMI without downtime.
- Configurable healthy percentage, instance warmup, and skip-matching to avoid replacing instances already on the new template.
ELB vs EC2 health check type:
HealthCheckType=EC2: only EC2 instance-level health (hardware fail).HealthCheckType=ELB: includes load balancer health checks (app-level failures).- ALWAYS use
ELBin production — catches app crashes that EC2 health misses.
Termination policy (which instance to terminate when scaling in):
- Default: balance across AZs, then OldestLaunchTemplate, then ClosestToNextInstanceHour.
- Customizable. The OldestInstance variant is common for canary-style rolling deploys.
Lifecycle hooks:
- Pause instance launch (
autoscaling:EC2_INSTANCE_LAUNCHING) for up to 100 minutes while custom setup runs. - Pause instance terminate (
autoscaling:EC2_INSTANCE_TERMINATING) to drain connections / snapshot logs before termination. - Hook completes with
CompleteLifecycleActionAPI.
Warm pools (cost optimization):
- Pre-initialize instances in a 'warm pool' (stopped state) — launch is faster than from scratch.
- Used for apps with long warmup (large container pulls, JIT warmup, cache priming).
Route 53 routing policies catalog: 7 policies, 7 use cases
Route 53 supports seven routing policies. The exam tests pattern recognition — match the scenario to the policy.
1. Simple routing:
- One record name → one or more values.
- If multiple values, Route 53 returns all in random order; client picks one.
- Use for: static / single-target records (cname.example.com → fixed IP).
2. Weighted routing:
- Multiple records, each with a weight (0-255).
- Route 53 distributes traffic proportionally to weights.
- Use for: blue/green or canary deployment (90% to v1, 10% to v2; ramp up).
- Setting weight=0 effectively disables a record without deleting it.
3. Failover routing:
- One primary + one secondary record.
- Route 53 health check on primary; if it fails, secondary serves.
- Use for: active/passive DR. Primary in us-east-1, secondary in us-west-2.
- Critical: the health check on the primary is required — without it, Route 53 always serves the primary.
4. Latency-based routing:
- One record per region, each pointing to that region's endpoint.
- Route 53 picks the region with lowest measured latency from the user's edge location.
- Latency table is pre-measured (not real-time per-request).
- Use for: global apps minimising user-facing latency.
5. Geolocation routing:
- Route by user's country / state.
- Records specify continent / country / US state.
- Use for: compliance ('EU users must hit EU region'), content localisation, geo-blocking.
- Always include a
Defaultrecord for users from regions you didn't enumerate.
6. Geoproximity routing (Traffic Flow only — uses Route 53 Traffic Flow product):
- Similar to geolocation but supports BIAS — push more traffic to a specific region/resource.
- Use for: shifting traffic during region failover, gradual migration between regions.
7. Multi-value answer routing:
- Returns up to 8 healthy records.
- Client-side load balancing (the client picks one).
- Each record can have a health check; only healthy records returned.
- Use for: poor-man's load balancing for non-HTTP traffic where you don't want an actual ELB.
Combining policies (Traffic Flow):
- Route 53 Traffic Flow lets you nest policies (e.g. geolocation routing → latency routing within each geo).
- Visual editor; version-controlled traffic policies.
Health checks:
- Default check: every 30 seconds; consider healthy after 3 successes / unhealthy after 3 failures.
- Fast check: every 10 seconds (paid).
- Types: endpoint (HTTP/HTTPS/TCP), CloudWatch alarm (alarm state), calculated (AND/OR of other checks).
- Calculated health checks: 'healthy if at least 2 of 3 endpoints are healthy' — useful for multi-endpoint apps.
Common SAA scenarios:
- 'DR failover Region' → Failover routing + health checks.
- 'Blue/green deployment' → Weighted routing, ramp from 0 → 100%.
- 'Global app, lowest latency' → Latency-based routing.
- 'EU compliance' → Geolocation routing.
- 'No ELB but load balance' → Multi-value answer routing.
Multi-AZ vs Read Replicas vs Aurora Replicas — the three RDS scaling primitives
RDS / Aurora offer three distinct mechanisms for HA and read scaling. They solve different problems; the exam tests which to apply.
Multi-AZ deployment (HA):
- Synchronous standby in a different AZ (same region).
- Standby is NOT readable — it exists only for failover.
- Failover triggers: planned (instance upgrade, OS patching) or unplanned (hardware fail, AZ outage).
- Failover time: typically 60-120 seconds. DNS endpoint updates to point at the standby.
- Clients with cached DNS see ~60-120 s of errors; app should reconnect on connection failure.
- Cost: ~2× single-AZ (you pay for the standby too).
Use for: any production workload with availability SLA. The default 'is this production?' answer.
Multi-AZ DB cluster (newer, RDS MySQL/PostgreSQL):
- One writer + two readable standbys across 3 AZs.
- Standbys ARE readable (unlike traditional Multi-AZ).
- Faster failover (~35 seconds typical).
- Better than traditional Multi-AZ for reads + faster failover.
Read Replicas (read scaling):
- Asynchronous replication from primary to N replicas (up to 15 per source for most engines).
- Replicas serve read-only traffic — offload from primary.
- Can be in SAME AZ, DIFFERENT AZ, or DIFFERENT REGION (cross-region replication).
- Cross-region replicas can be promoted to standalone read-write (used in DR).
- Replication lag varies; can be seconds to minutes under high write load.
Use for: read-heavy workloads, geographic read distribution (cross-region replica = lower latency for distant users), DR (cross-region replica = recovery target).
Cascading replicas (since 2023): a read replica can have its OWN read replicas. Useful for multi-region read distribution without burdening the primary.
Aurora Replicas (Aurora-specific):
- Up to 15 Aurora Replicas per cluster, all sharing the same storage layer.
- Sub-100ms replica lag typical (often < 10 ms) — shared storage means no log shipping.
- Auto-promotion: if writer fails, Aurora promotes a replica → ~1 minute failover.
- Reader endpoint load-balances across all Aurora Replicas.
- Cross-region: use Aurora Global Database (separate feature) for < 1 s cross-region replication.
Use for: any Aurora workload — Replicas serve reads AND act as warm failover targets. Beats RDS Multi-AZ + Read Replicas combined for most use cases.
Decision pattern:
- 'Production HA, MySQL / Oracle / SQL Server' → Multi-AZ deployment.
- 'Read-heavy workload, RDS' → Multi-AZ + Read Replicas (HA + read scale).
- 'Aurora workload, any use case' → Aurora Replicas (cover both HA and reads).
- 'Cross-region DR with < 1 s RPO' → Aurora Global Database (NOT Read Replicas — they're async).
- 'Read scaling for distant users (latency)' → Cross-region Read Replicas or Aurora Global Database.
Common SAA traps:
- 'Multi-AZ for read scaling' → NO; standby isn't readable (unless Multi-AZ DB cluster mode).
- 'Promote read replica during a regional failure' → for cross-region replicas, YES; for same-region, depends on use case.
- 'Aurora Global Database is the same as Aurora Replicas' → NO; Global Database adds cross-region replication.
Aurora Global Database: setup, failover, cross-region read scale
Aurora Global Database (AGDB) replicates an Aurora cluster across regions via a dedicated network channel. Different from cross-region read replicas in three important ways.
vs cross-region Read Replicas:
- AGDB: < 1 second cross-region RPO typical; managed failover; secondary region clusters are full Aurora clusters (auto-scale storage, replicas, etc.).
- Read Replicas: log-shipping based; lag can be seconds-minutes; manual failover by promote.
Architecture:
- 1 primary region (1 writer + up to 15 readers).
- Up to 5 secondary regions, each with up to 15 readers (75 total readers possible).
- Replication is one-way: primary → secondaries. Secondaries can't accept writes (read-only).
Use cases:
- Cross-region DR (RPO < 1 s, RTO ~1 min via managed promote).
- Low-latency reads for global users — local secondary region serves reads to nearby users.
- Read scaling across regions — analytics or reporting workloads on a secondary region don't impact primary write performance.
Setup steps:
- Create an Aurora cluster in the primary region (MySQL 5.6+ / 5.7+ / 8.0 or PostgreSQL 10+).
- From the cluster's Modify menu, choose 'Add region' → pick secondary region(s).
- Aurora provisions the secondary cluster with the same engine version + parameter group + KMS key (multi-region KMS key required for cross-region encrypted replication).
- Replication begins automatically.
Managed planned failover (zero data loss):
- Use for: failover testing, primary-region maintenance, controlled migration.
- Aurora pauses writes briefly, ensures secondary has replicated all changes, promotes secondary to primary, repoints replication.
- ~ 1 minute total downtime.
Unplanned failover (cross-region):
- Manually invoke
RemoveFromGlobalCluster+PromoteReadReplicaDBClusteron the secondary. - Or use a managed playbook with Route 53 health-check failover routing.
- Data loss possible (whatever wasn't replicated when primary failed).
Write forwarding (newer feature):
- Applications in the secondary region can issue writes — Aurora forwards them transparently to the primary region.
- Useful for read-mostly apps where occasional writes don't justify a full primary topology in the secondary region.
- Adds cross-region latency on writes — not for high-write workloads.
Cost:
- Per-secondary-region pricing: each secondary region is a full Aurora cluster (compute + storage + I/O charges).
- Replication bandwidth between regions ≈ standard cross-region data transfer rates.
- Generally 2-3× a single-region cluster.
Limitations:
- Same engine version + family across regions (no major-version mixing).
- 5 secondary regions max (15 reader instances each).
- Parallel Query, Aurora Serverless v2 features have specific compatibility — check current docs.
Common exam scenarios:
- 'Cross-region DR with < 1 min failover' → Aurora Global Database.
- 'Global read scaling, no manual promote needed' → Aurora Global Database (reads served by secondary regions).
- 'Need writes in multiple regions simultaneously' → NOT AGDB; consider DynamoDB Global Tables (multi-active writes) or accept eventual consistency.
S3 Cross-Region Replication: filters, versioning, RTC, batch backfill
S3 Cross-Region Replication (CRR) asynchronously copies new objects from a source bucket to a destination bucket in a different region. Same-Region Replication (SRR) does the same within a region.
Prerequisites:
- Versioning must be enabled on BOTH source and destination buckets.
- IAM role with permissions to read from source + write to destination + (if encrypted) use KMS keys in both regions.
- Source and destination can be in same or different accounts.
Replication rules (define what to copy):
- Filter by prefix:
prefix: "customer-data/"→ only objects with that prefix replicate. - Filter by tag:
tag: { team=production }→ only tagged objects replicate. - Filter combined (prefix AND tag).
- Multiple rules per bucket; priority order resolves conflicts.
What replicates:
- New objects after rule creation. Existing objects do NOT replicate by default.
- Object metadata, tags, ACLs.
- Delete markers (configurable; default off).
- Replica modification sync (replicas back to source) — opt-in feature for two-way scenarios.
What does NOT replicate:
- Objects encrypted with SSE-C (customer-supplied keys).
- Objects created before replication was enabled (use S3 Batch Replication for backfill).
- Permanent deletes (only delete markers can replicate, and only if configured).
Replication Time Control (RTC) — SLA-backed replication:
- Without RTC: replication is best-effort; typically seconds-minutes but no guarantee.
- With RTC: AWS guarantees 99.99% of objects replicate within 15 minutes.
- RTC adds CloudWatch metrics (replication latency, missed-SLA count).
- Cost: $0.015 per GB replicated (in addition to standard CRR data transfer + per-request charges).
- Use for: compliance scenarios requiring guaranteed replication time, business-critical pipelines.
S3 Batch Replication (backfill existing objects):
- Replicates objects that existed BEFORE replication rule creation.
- Replicates objects that previously failed replication.
- Uses S3 Batch Operations under the hood.
- One-time operation per backfill request; pay per object processed.
Cross-account CRR:
- Destination bucket policy must allow the source account's replication role.
- Object ownership: by default, the source-account replication role writes the replica → source account owns replica. Use bucket-owner-full-control or set
AccessControlTranslationso destination account owns.
KMS-encrypted CRR:
- Source-side KMS key: replication role needs
kms:Decrypt. - Destination-side KMS key: replication role needs
kms:GenerateDataKey/kms:Encrypt. - Multi-region KMS keys (since 2021): use the same key ID across regions for seamless cross-region encrypted replication without re-encryption.
Common patterns:
- Compliance — data must exist in 2 regions: CRR with RTC.
- DR for static content (web assets, software downloads): CRR to alternate region; failover via Route 53.
- Compliance — data must NOT leave a region: do NOT enable CRR; use Same-Region Replication (SRR) to a separate AZ-resilient bucket if needed.
- Analytics on a copy: CRR to a dedicated analytics bucket in a region close to analytics tooling.
Cost:
- Cross-region data transfer (~$0.02/GB depending on region pair).
- PUT requests on destination (standard pricing).
- Storage on destination (standard pricing for the chosen storage class).
- Optional RTC ($0.015/GB).
DR strategy worked examples: cost + RTO + RPO trade-off across all four patterns
The four canonical DR strategies form a cost / recovery-time spectrum. Each maps to a specific RTO + RPO range. The exam tests pattern matching.
Worked Example 1 — Backup-and-restore (RTO: hours-days, RPO: hours):
Scenario: small business, daily backups, can tolerate ~24h of data loss + 4-8h to recover.
Architecture:
- AWS Backup runs daily snapshots of RDS, EBS, DynamoDB, EFS.
- Snapshots copied to alternate region nightly.
- Application infrastructure as code (CloudFormation / CDK) ready to deploy.
- On disaster: deploy infrastructure in DR region, restore latest snapshots, repoint DNS.
Monthly cost: ~$50/month (snapshot storage + cross-region copy). RTO: 4-8 hours (deploy infra + restore data). RPO: 24 hours (last backup).
Worked Example 2 — Pilot light (RTO: 10s of minutes, RPO: minutes):
Scenario: e-commerce platform, ~30 min RTO budget, ~1 min RPO budget (last few orders may be lost).
Architecture:
- Primary region: full stack (ALB + EC2 ASG + RDS Multi-AZ + ElastiCache + S3).
- DR region: RDS Read Replica (continuously replicated), S3 CRR enabled, EC2 launch templates ready but no instances running, ALB pre-provisioned with empty target group.
- On disaster: promote RDS replica to standalone, launch EC2 ASG (size from launch template), register with ALB, update Route 53 failover record.
Monthly cost: ~$500/month (read replica + cross-region S3 + idle infrastructure). RTO: 15-30 minutes (mostly EC2 launch + warmup). RPO: 1 minute (read replica lag).
Worked Example 3 — Warm standby (RTO: minutes, RPO: seconds):
Scenario: SaaS application, < 5 min RTO budget, < 1 min RPO budget.
Architecture:
- Primary region: full stack at full capacity.
- DR region: full stack at REDUCED capacity (e.g. minimum ASG = 2 instances vs 20 in primary). All components running.
- RDS / Aurora Multi-AZ in DR region with cross-region replica.
- DNS uses Route 53 weighted routing: 100% primary, 0% DR (or active health checks).
- On disaster: scale DR ASG up (Auto Scaling triggers), shift traffic via Route 53.
Monthly cost: $2 000/month (50% of primary, scaled down). RTO: 2-5 minutes (Auto Scaling scale-up time). RPO: < 30 seconds.
Worked Example 4 — Multi-region active-active (RTO: seconds, RPO: near-zero):
Scenario: global FinTech app, must handle region failure with no perceived downtime.
Architecture:
- Primary AND secondary regions both serve traffic at full capacity (e.g. 50% to each via Route 53 latency-based routing).
- Data layer: Aurora Global Database (< 1s RPO, managed failover) OR DynamoDB Global Tables (multi-active writes with last-writer-wins).
- Caches: ElastiCache in each region (regionally local; global sync via DynamoDB Streams or Aurora replication).
- Stateless tiers replicated in both regions.
- On disaster: Route 53 health checks detect failure → traffic shifts to healthy region automatically.
Monthly cost: $5 000+/month (2× single-region; full stack everywhere). RTO: seconds (Route 53 health check + DNS TTL). RPO: near-zero (continuous replication).
Picking the right strategy:
| Scenario | Strategy |
|---|---|
| Cost > RTO; some downtime acceptable | Backup-and-restore |
| Moderate cost; 15-30 min RTO acceptable | Pilot light |
| Low RTO required; SLA matters | Warm standby |
| Mission-critical; near-zero RTO | Multi-region active-active |
Common exam phrasing → answer:
- 'lowest cost DR' → Backup-and-restore.
- 'pre-provisioned DB, on-demand compute' → Pilot light.
- 'scaled-down but running stack' → Warm standby.
- 'no perceived downtime, full stack everywhere' → Multi-region active-active.
DR strategies compared
| Strategy | RTO | RPO | Cost | Pattern |
|---|---|---|---|---|
| Backup-and-restore | Hours to days | Hours (last backup) | $ | Backups in S3 / AWS Backup. Restore = build infra + restore data on failover. |
| Pilot light | 10s of minutes | Minutes (DB replicated continuously) | $$ | DB replicated to DR region; compute templates ready but stopped; start on failover. |
| Warm standby | Minutes | Seconds to minutes | $$$ | Downsized but running stack in DR region; scale up on failover. |
| Multi-region active-active | Seconds | Near-zero | $$$$ | Full stack running in 2+ regions; Route 53 latency / multi-value routing; data via Aurora Global / DynamoDB Global Tables. |
Decision tree
Sharp facts the exam loves — give these one last read before exam day.
Cheat sheet
Sharp facts the exam loves — scan these before test day.
- Multi-AZ for in-region HA; Multi-Region for DR
Spread stateless tiers across ≥2 AZs behind a load balancer. RDS Multi-AZ: synchronous standby, 60-120 s failover. Multi-Region only when single-region failure is in your threat model — adds latency, cost, replication complexity.
3 questions test this
- A company operates a stateless web application on an Auto Scaling group with instances in a single Availability Zone. The company wants to…
- A company is designing a highly available architecture for a web application using an Application Load Balancer (ALB). The ALB must be…
- A company is designing a highly available web application that will run on Amazon EC2 instances behind an Application Load Balancer. The…
- Pick DR strategy by RTO and RPO
Four canonical strategies in increasing cost / decreasing RTO/RPO: backup-and-restore (hours), pilot-light (minutes-hours), warm-standby (minutes), multi-site active-active (near-zero). The exam picks the cheapest one that meets the stated RTO/RPO.
- ASG + ELB + health checks is the default stateless tier
Auto Scaling Group across ≥2 AZs + ALB/NLB with health checks at the target group. Unhealthy instances drain + replace automatically. ASG min/max/desired control capacity; target-tracking or step scaling policies handle demand changes.
10 questions test this
- A company runs an application on EC2 instances in an Auto Scaling group that is attached to an Application Load Balancer target group. The…
- A company runs an application on EC2 instances in an Auto Scaling group behind an Application Load Balancer. The Auto Scaling group uses…
- A company runs a web application on Amazon EC2 instances in an Auto Scaling group behind an Application Load Balancer (ALB). The Auto…
- A company has an Auto Scaling group with EC2 instances distributed across two Availability Zones behind an Application Load Balancer. The…
- A company runs a mission-critical web application on Amazon EC2 instances managed by an Auto Scaling group. An Application Load Balancer…
- A company operates a critical application using Amazon EC2 instances in an Auto Scaling group. The Auto Scaling group spans three…
- A company runs a web application on an Auto Scaling group of EC2 instances behind an Application Load Balancer (ALB). The group spans three…
- A company operates an e-commerce platform on Amazon EC2 instances in an Auto Scaling group that spans three Availability Zones. The…
- A financial services company runs a critical web application on Amazon EC2 instances in an Auto Scaling group across three Availability…
- A solutions architect is designing a fault-tolerant architecture for a stateless application running on EC2 instances in an Auto Scaling…
- Route 53 has 7 routing policies, each for a specific intent
Simple (one answer), Weighted (split traffic), Latency (route to closest region), Failover (active/passive via health check), Geolocation (by user country), Geoproximity (by lat/lon + bias), Multi-value (DNS-level load balancing).
3 questions test this
- A company operates a globally distributed application with resources in us-east-1 us-west-2 and eu-west-1. The company wants to implement…
- A company operates a global application with resources in four AWS Regions. The company uses Route 53 with latency-based routing to direct…
- A company operates a multi-region application with resources in us-east-1 and eu-west-1. During a recent outage, both the primary and…
- RDS Multi-AZ failover: 60-120 seconds typical
Automatic failover[1] updates the DNS endpoint to the standby; clients with cached DNS will see ~60-120 s of errors. App needs to reconnect on connection failure. Standby is NOT readable — for read scaling, use Read Replicas[13] (separate feature).
- Aurora Global Database: typically sub-second cross-region RPO
Replicates Aurora across regions[2] via dedicated network. RPO typically <1 s; failover RTO ~1 min (managed promotion). Secondary regions support read-only and can be promoted to writer. Big advantage over manual cross-region replicas.
- Route 53 health checks: 30 s default interval, 3 failures = unhealthy
Default check interval 30 s[14], healthy threshold 3, unhealthy threshold 3. Fast failover requires faster checks (10 s interval supported, paid). Use 'calculated' health checks to combine multiple endpoint checks for AND/OR logic.
5 questions test this
- A healthcare company operates a patient portal that consists of a web application, an API layer, and a database. The company uses Amazon…
- A financial services company runs trading applications across two AWS Regions with Application Load Balancers in each Region. The company's…
- A company operates a microservices application with five backend services distributed across multiple EC2 instances. The company wants to…
- A company uses Amazon Route 53 to manage DNS for a multi-tier application deployed across three Availability Zones. Each tier has multiple…
- A company has a multi-tier web application with web servers distributed across three Availability Zones. The company uses Route 53 to route…
- ELB health check type EC2 vs ELB
ASG
HealthCheckType=EC2[15] only replaces instances that the EC2 instance itself reports as unhealthy (hardware fail).HealthCheckType=ELBreplaces instances that the load balancer's health check fails — catches app-layer failures too. Use ELB for production.9 questions test this
- A company runs an application on EC2 instances in an Auto Scaling group that is attached to an Application Load Balancer target group. The…
- A company runs a web application on Amazon EC2 instances in an Auto Scaling group behind an Application Load Balancer (ALB). The Auto…
- A company has an Auto Scaling group with EC2 instances distributed across two Availability Zones behind an Application Load Balancer. The…
- A company runs a mission-critical web application on Amazon EC2 instances managed by an Auto Scaling group. An Application Load Balancer…
- A company operates a critical application using Amazon EC2 instances in an Auto Scaling group. The Auto Scaling group spans three…
- A company runs a web application on an Auto Scaling group of EC2 instances behind an Application Load Balancer (ALB). The group spans three…
- A company operates an e-commerce platform on Amazon EC2 instances in an Auto Scaling group that spans three Availability Zones. The…
- A financial services company runs a critical web application on Amazon EC2 instances in an Auto Scaling group across three Availability…
- A solutions architect is designing a fault-tolerant architecture for a stateless application running on EC2 instances in an Auto Scaling…
- S3 Cross-Region Replication: async, prefix/tag-filtered
CRR replicates new objects[4] from source to destination bucket asynchronously (typically seconds). Existing objects need a one-time batch operation. Can filter by prefix or tag. Versioning must be on for both buckets.
8 questions test this
- A company stores critical application data in an Amazon S3 bucket in the us-east-1 Region. The company needs to implement a disaster…
- A company has configured S3 Cross-Region Replication from a source bucket in eu-west-1 to a destination bucket in us-west-2. After…
- A financial services company wants to implement a disaster recovery solution for critical data stored in Amazon S3. The company needs to…
- A company has been using Amazon S3 for several years and recently enabled S3 Cross-Region Replication on an existing bucket containing 5 TB…
- A company stores critical financial data in an Amazon S3 bucket in the us-east-1 Region. The company must replicate this data to a bucket…
- A company has been using Amazon S3 Cross-Region Replication to replicate data from us-west-2 to eu-central-1 for several months. The…
- A healthcare company stores patient records in an Amazon S3 bucket in us-east-1. The company's compliance team requires that a copy of all…
- A multinational company uses Amazon S3 Multi-Region Access Points to serve content from buckets in us-east-1 and eu-west-1. The company…
- AWS Backup centralizes backups across services + accounts
Backup plans + selections[8] cover RDS, DynamoDB, EFS, EBS, FSx, Storage Gateway, etc. Cross-region, cross-account copy supported. Audit Manager + Backup Vault Lock for compliance scenarios.
- Route 53 failover record requires a health check on PRIMARY
Active/passive failover routing[16] needs the primary record to have an associated health check. If the check fails, the secondary record is served. Without the health check, Route 53 always serves the primary.
8 questions test this
- A company hosts a web application on Amazon EC2 instances behind an Application Load Balancer (ALB) in us-east-1. For disaster recovery,…
- A company runs a web application on Amazon EC2 instances behind an Application Load Balancer in the us-east-1 Region. The company wants to…
- A company hosts a web application on Amazon EC2 instances behind an Application Load Balancer in us-east-1 as the primary environment. The…
- A company runs a web application with the primary deployment in us-east-1 and a standby deployment in us-west-2 for disaster recovery. The…
- A company runs a mission-critical web application on Amazon EC2 instances behind an Application Load Balancer in the us-east-1 Region. To…
- A company runs a web application in the us-east-1 Region and has deployed a disaster recovery environment in us-west-2. The company wants…
- A company runs a web application with an Application Load Balancer in the us-east-1 Region as the primary site and another Application Load…
- A media streaming company has deployed its application in the us-west-2 Region with an Application Load Balancer. The company is…
- ASG health check grace period protects initializing instances
The health check grace period tells Auto Scaling how long to wait before evaluating the health of a newly launched instance after it enters InService state. Set it to at least as long as the application startup time; otherwise ELB health check failures during initialization cause continuous termination and replacement loops.
15 questions test this
- A company is deploying a new application on EC2 instances in an Auto Scaling group behind an Application Load Balancer. The application…
- A company is deploying a new web application using Amazon EC2 instances in an Auto Scaling group with an Application Load Balancer. The…
- A company is deploying an application with an Auto Scaling group behind an Application Load Balancer. The application requires a warm-up…
- A company runs an application on EC2 instances in an Auto Scaling group that is attached to an Application Load Balancer target group. The…
- A company deploys a web application on Amazon EC2 instances in an Auto Scaling group with an Application Load Balancer. The instances…
- A company deployed a new application version to EC2 instances in an Auto Scaling group. After deployment, instances are immediately being…
- A company deploys a web application on EC2 instances in an Auto Scaling group across multiple Availability Zones. After deployment, new…
- A company hosts an e-commerce application on Amazon EC2 instances in an Auto Scaling group attached to an Application Load Balancer.…
- A company runs a three-tier web application with an Auto Scaling group of EC2 instances behind an Application Load Balancer. The Auto…
- A company runs a mission-critical web application on Amazon EC2 instances managed by an Auto Scaling group. An Application Load Balancer…
- A company runs a web application behind an Application Load Balancer with Amazon EC2 instances in an Auto Scaling group spread across two…
- A company runs a web application on an Auto Scaling group of EC2 instances behind an Application Load Balancer (ALB). The group spans three…
- A solutions architect is configuring an Auto Scaling group for a web application that runs behind an Application Load Balancer. The…
- A company deploys a web application on Amazon EC2 instances in an Auto Scaling group across multiple Availability Zones. An Application…
- A company runs a web application on Amazon EC2 instances in an Auto Scaling group behind an Application Load Balancer. The application…
- ALB deregistration delay (connection draining) for long requests
When a target is removed from an ALB target group, the load balancer stops sending new requests but waits for the deregistration delay (default 300 s, range 0-3600 s) before completing deregistration. Set this value to at least the maximum expected request processing time to prevent HTTP 5xx errors during scale-in events.
8 questions test this
- An e-commerce company operates a web application on Amazon EC2 instances in an Auto Scaling group behind an Application Load Balancer. The…
- A company runs a web application on Amazon EC2 instances in an Auto Scaling group behind an Application Load Balancer. During scale-in…
- A company operates a web application with variable traffic patterns. The application runs on EC2 instances in an Auto Scaling group across…
- A company runs a web application with an Auto Scaling group behind an Application Load Balancer. The application processes long-running…
- A company has an e-commerce application running on Amazon EC2 instances in an Auto Scaling group behind an Application Load Balancer.…
- A company runs an e-commerce application on Amazon EC2 instances in an Auto Scaling group attached to an Application Load Balancer. During…
- A company runs an e-commerce application on EC2 instances in an Auto Scaling group behind an Application Load Balancer. During scale-in…
- A solutions architect is designing an application that runs on Amazon EC2 instances in an Auto Scaling group behind an Application Load…
- ALB slow start mode ramps traffic to new targets
Slow start mode causes the ALB to linearly increase the share of requests sent to a newly registered target over a configurable duration of 30–900 seconds. Use it when instances need a warm-up period (e.g., JIT cache warming, dataset loading) before they can handle their full share of traffic.
7 questions test this
- A company deploys a new version of its web application that requires a warm-up period of 60 seconds to populate local caches before it can…
- A company is deploying an application with an Auto Scaling group behind an Application Load Balancer. The application requires a warm-up…
- A solutions architect is optimizing an Auto Scaling configuration for an e-commerce application. The application runs on EC2 instances…
- A company has a web application that requires time to warm up its caches before handling production traffic at full capacity. The…
- A company deploys a web application that requires caching of application data before it can respond to requests with optimal performance.…
- A company runs an e-commerce application on Amazon EC2 instances behind an Application Load Balancer. The instances are in an Auto Scaling…
- A company operates a Java-based application on Amazon EC2 instances behind an Application Load Balancer. The application performs JIT…
- ALB cross-zone load balancing always on at LB level, configurable per target group
For Application Load Balancers, cross-zone load balancing is always enabled at the load balancer level and cannot be turned off. However, it can be explicitly disabled at the target group level, overriding the load balancer default. When enabled, each LB node distributes traffic evenly across all registered targets in all enabled Availability Zones.
4 questions test this
- A company runs a web application on Amazon EC2 instances in an Auto Scaling group across three Availability Zones. The instances are…
- A company runs a web application on Amazon EC2 instances behind an Application Load Balancer (ALB). The instances are distributed across…
- A company hosts a critical application on Amazon EC2 instances in an Auto Scaling group that is attached to an Application Load Balancer…
- A company runs a mission-critical web application behind an Application Load Balancer (ALB) deployed across three Availability Zones in a…
- NLB provides static IP per AZ; assign Elastic IPs for fixed addresses
Network Load Balancers automatically provide one static IP address per enabled Availability Zone. For internet-facing NLBs you can also assign your own Elastic IP per AZ, giving external clients fixed addresses to allowlist in firewalls. NLB operates at Layer 4, supports ultra-low latency, and preserves the client source IP address by default.
7 questions test this
- A financial services company requires a load balancing solution for a TCP-based application that must have static IP addresses for firewall…
- A company is deploying a new application that requires static IP addresses for client whitelisting purposes. The application must be highly…
- A financial services company requires a load balancing solution for their trading application that needs static IP addresses for client…
- A financial services company requires a highly available architecture for a trading application that needs extremely low latency and must…
- A financial services company is deploying a TCP-based trading application that requires static IP addresses for firewall allowlisting by…
- A financial services company requires static IP addresses for its trading application to allow clients to whitelist specific IP addresses…
- A company is migrating a legacy TCP-based application to AWS. The application requires that the client source IP address be preserved for…
- Route 53 latency routing + Evaluate Target Health = active-active multi-region failover
Latency-based routing records with Evaluate Target Health set to Yes implement active-active failover: all healthy regions serve traffic based on lowest latency, and Route 53 automatically stops routing to a region when its resources become unhealthy. For hierarchical configurations (latency over weighted), ETH on the top-level alias causes Route 53 to traverse the tree and consider the region unhealthy only when all underlying weighted records fail.
12 questions test this
- A company has deployed identical applications on Amazon EC2 instances behind Application Load Balancers in both the us-west-2 and eu-west-1…
- A company operates a global e-commerce platform with Application Load Balancers deployed in three AWS Regions: us-east-1, eu-west-1, and…
- A company has an e-commerce application running on Amazon EC2 instances behind Application Load Balancers in two AWS Regions. The company…
- A company deploys its application across three AWS Regions to provide global availability. The company wants Route 53 to route users to the…
- A company operates an e-commerce application across two AWS Regions with Application Load Balancers in both us-west-2 and us-east-1. The…
- A solutions architect is designing a multi-Region active-active architecture for a global application. The application is deployed in…
- A company operates a global e-commerce platform deployed across three AWS Regions: us-east-1, eu-west-1, and ap-southeast-1. The company…
- A company hosts an e-commerce application across two AWS Regions in an active-active configuration. The application uses Application Load…
- A solutions architect is designing a multi-region architecture for a critical application. The primary region has weighted routing records…
- A global e-commerce company hosts its application across three AWS Regions: us-east-1, eu-west-1, and ap-southeast-1. The company wants to…
- A company has deployed Application Load Balancers in three AWS Regions to serve global customers. The company wants Route 53 to distribute…
- A company has deployed its application in two AWS Regions with an Application Load Balancer in each region. The company wants to implement…
- Route 53 calculated health checks aggregate child health check results
A calculated health check monitors other health checks (child health checks) and reports healthy when the number of healthy children meets a configurable threshold. This lets you trigger DNS failover only when a minimum number of endpoints are down (e.g., healthy if at least 2 of 6 servers are up), rather than reacting to individual endpoint failures.
8 questions test this
- A company has deployed a web application across six EC2 instances in two Availability Zones, with three instances in each zone. A solutions…
- A healthcare company operates a patient portal that consists of a web application, an API layer, and a database. The company uses Amazon…
- A company runs a microservices application with five independent web servers. The application requires at least three healthy servers to…
- A solutions architect is designing a multi-tier application that spans three AWS Regions. The application has web servers in each region,…
- A company runs an application with five web servers behind an Application Load Balancer. The company wants to be notified and trigger DNS…
- A company operates a microservices application with five backend services distributed across multiple EC2 instances. The company wants to…
- A company uses Amazon Route 53 to manage DNS for a multi-tier application deployed across three Availability Zones. Each tier has multiple…
- A company has a multi-tier web application with web servers distributed across three Availability Zones. The company uses Route 53 to route…
- Route 53 weighted records + health checks implement active-active failover
Any routing policy other than Failover combined with health checks creates an active-active configuration. With weighted records, Route 53 distributes traffic according to weights while all records are healthy; when a record's health check fails, Route 53 excludes it from responses and redistributes remaining traffic to healthy records. A zero-weight record acts as a standby, receiving traffic only when all nonzero-weight records are unhealthy.
7 questions test this
- A solutions architect is designing a multi-region architecture with Amazon Route 53 health checks. The application runs on EC2 instances in…
- A solutions architect is designing an active-active multi-region architecture for a global web application. The application runs on Amazon…
- A company wants to implement a standby architecture using Amazon Route 53 weighted records. The primary resources should receive all…
- A solutions architect needs to design a highly available architecture for a global application. The application runs in two AWS Regions…
- A company operates a global e-commerce platform with deployments in us-east-1 and eu-west-1. The solutions architect needs to implement a…
- A company is designing an active-active architecture for its web application across us-east-1 and us-west-2 Regions. Both Regions should…
- A global media company wants to implement a multi-tier DNS failover architecture. The company has two data centers in each of three Regions…
- Route 53 hierarchical routing: latency alias over per-region weighted records
A common multi-tier DNS pattern uses latency alias records at the top level (for region selection) pointing to weighted records within each region (for intra-region distribution). Enabling Evaluate Target Health on the latency alias causes Route 53 to consider a region healthy only if at least one of its weighted child records is healthy, enabling cascading health propagation.
4 questions test this
- A company operates a global e-commerce platform with deployments in us-east-1 and eu-west-1. The solutions architect needs to implement a…
- A global media company wants to implement a multi-tier DNS failover architecture. The company has two data centers in each of three Regions…
- A solutions architect is designing a multi-region architecture for a critical application. The primary region has weighted routing records…
- A company has a multi-Region application architecture with resources in us-east-1 (primary) and eu-west-1 (secondary). The company uses…
- Aurora replica failover priority tiers 0 (highest) to 15 (lowest)
Each Aurora Replica can be assigned a promotion priority tier from 0 (promoted first) to 15 (promoted last). When the primary instance fails, Aurora promotes the replica with the lowest tier number. Assign tier 0 to the preferred standby (e.g., same instance class as the primary) and higher tiers to replicas used for analytics or reporting.
4 questions test this
- A company has an Amazon Aurora MySQL cluster with one primary instance and three Aurora Replicas across three Availability Zones. The…
- A company uses an Amazon Aurora PostgreSQL DB cluster with one writer instance and three Aurora Replicas across three Availability Zones.…
- A company operates an Amazon Aurora MySQL cluster with one primary instance and three Aurora Replicas distributed across three Availability…
- A company operates an Amazon Aurora MySQL cluster with one writer instance and three reader instances distributed across three Availability…
- RDS Snapshot copies cross-region + cross-account for DR
RDS automatic snapshots are tied to the source region. Manual or copied snapshots[17] can move cross-region (encrypted with a regional KMS key) or cross-account (share with target account). Used in pilot light / warm standby DR strategies.
Also tested in
References
- RDS Multi-AZ deployments
- Aurora Global Database
- DynamoDB Global Tables
- S3 Cross-Region Replication
- Elastic Load Balancing overview
- Aurora replication
- Disaster recovery options in the cloud (AWS whitepaper) Whitepaper
- AWS Backup
- Route 53 routing policy
- Amazon EC2 Auto Scaling User Guide
- ALB target-group health checks
- ASG target-tracking scaling policies
- RDS Read Replicas
- Route 53 health check values
- ASG health checks for instances
- Route 53 DNS failover health checks
- https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_CopySnapshot.html