Day 24: Disaster Recovery and Backup
Disaster Recovery Overview: Disaster recovery (DR) is a set of policies, tools, and procedures designed to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster. The goal is to minimize downtime, data loss, and financial impact in the event of a disaster.
Key Metrics:
- Recovery Point Objective (RPO): This is the maximum tolerable period during which data might be lost due to a disaster. It defines the point in time to which systems and data must be restored.
- Recovery Time Objective (RTO): This is the targeted duration of time within which a business process must be restored after a disaster to avoid unacceptable consequences. It measures the time it takes to recover systems and resume normal operations.
Strategies:
1.Backup and Restore:
- Description: Regularly backup data and system configurations, and in the event of a disaster, restore them to the last known good state.
- Pros: Simplicity, cost-effectiveness.
- Cons: Potentially longer recovery times.
2.Pilot Light:
Description: Maintain a minimal version of the infrastructure (a “pilot light”) that can be quickly scaled up in case of a disaster.
- Pros: Faster recovery compared to backup and restore.
- Cons: Higher cost than backup and restore.
3.Warm Standby:
Description: Keep a scaled-down version of the fully functional system running.
- Pros: Faster recovery than pilot light, more resources are pre-allocated.
- Cons: Higher cost than pilot light, but lower than a hot site.
4.Hot Site/Multi-Site Approach:
- Description: Maintain a fully operational duplicate of the infrastructure at a separate location.
- Pros: Minimal downtime, minimal data loss.
- Cons: Highest cost among the strategies.
Backup Strategies:
1.EBS Snapshots (Amazon Elastic Block Store):
- Description: Point-in-time snapshots of EBS volumes, allowing for easy recovery of volumes.
- Use Case: AWS environments.
2.RDS Automated Backups / Snapshots:
- Description: Automated backups of Amazon RDS instances, including database and transaction logs.
- Use Case: AWS RDS managed database environments.
3.Regular Pushes to S3 / S3 IA / Glacier:
- Description: Regularly backing up critical data to Amazon S3 (Simple Storage Service) or its storage classes like S3 IA (Infrequent Access) or Glacier (cold storage).
- Use Case: Storing critical data in AWS.
4.Lifecycle Policy:
- Description: Automate the transition of objects between storage classes or delete them when they are no longer needed.
- Use Case: Cost optimization and data management.
5.Cross-Region Replication:
- Description: Replicate data across different AWS regions for added resilience.
- Use Case: Geographical redundancy and compliance requirements.
6.From On-Premises: Snowball or Storage Gateway:
- Description: Use AWS Snowball for large-scale data transfer or AWS Storage Gateway for seamless on-premises to cloud integration.
- Use Case: Migrating or backing up on-premises data to AWS.