Backup & Disaster Recovery Plan
Recovery Objectives
| Metric | Target | How Achieved |
|---|---|---|
| RTO (Recovery Time Objective) | 30 minutes | Azure SQL PITR + pre-deploy snapshots |
| RPO (Recovery Point Objective) | 1 hour | Azure SQL continuous backup + daily .bacpac exports |
Backup Strategy
1. Azure SQL Point-in-Time Restore (PITR)
- Automatic, managed by Azure
- Retention: 7 days (Basic/Standard tier)
- Granularity: Any point in time within retention window
- RPO: ~5 minutes (continuous transaction log backup)
2. Pre-Deploy .bacpac Export
- Triggered by: CI/CD pipeline before each production deployment
- Storage: Azure Blob Storage container
db-backupsinhcssstorageprod - Retention: 30 days, auto-cleanup of older backups
- Script:
infra/backup-database.sh prod pre-deploy
3. Scheduled Daily Backup
- Schedule: Daily at 2:00 AM UTC via GitHub Actions
- Workflow:
.github/workflows/scheduled-backup.yml - Storage: Same blob container
db-backups - Can be triggered manually via GitHub Actions workflow dispatch
Restore Procedures
Option 1: Azure Portal Point-in-Time Restore (Fastest)
- Navigate to Azure Portal > SQL Database > EventsCore
- Click "Restore" in the toolbar
- Select the desired point in time
- Choose a new database name (e.g.,
EventsCore-restored) - Wait for restore to complete
- Verify data integrity
- Swap connection strings or rename databases
Option 2: Restore from .bacpac
# List available backups
az storage blob list \
--container-name db-backups \
--account-name hcssstorageprod \
--query "[].{Name:name, Modified:properties.lastModified}" \
--output table
# Import .bacpac to a new database
az sql db import \
--admin-user sqladmin \
--admin-password "$SQL_ADMIN_PASSWORD" \
--auth-type SQL \
--name EventsCore-restored \
--server hcss-sql-prod \
--resource-group hcss-rg-prod \
--storage-key "$STORAGE_KEY" \
--storage-key-type StorageAccessKey \
--storage-uri "https://hcssstorageprod.blob.core.windows.net/db-backups/BACKUP_FILE.bacpac"
Option 3: Rollback After Bad Migration
- Do NOT attempt to reverse-migrate
- Restore database from the pre-deploy backup
- Fix the migration code
- Redeploy through the standard CI/CD pipeline
Disaster Recovery Scenarios
Database Corruption
- Use Azure PITR to restore to the last known good state
- Verify data integrity with application health checks
- Update connection string if using a new database name
Accidental Data Deletion
- Use PITR to restore to just before the deletion
- Export the needed data from the restored database
- Import the data back into the production database
Complete Region Failure
- Azure SQL geo-replication (if enabled) provides automatic failover
- Otherwise, restore from .bacpac in a different region
- Update DNS/connection strings to point to the new region
- Redeploy the application to a new App Service in the target region
Testing
Backup restore procedures should be tested quarterly:
- Restore a .bacpac to a test database
- Verify the application can connect and function
- Document any issues and update this runbook