Skip to main content

Backup & Disaster Recovery Plan

Recovery Objectives

MetricTargetHow Achieved
RTO (Recovery Time Objective)30 minutesAzure SQL PITR + pre-deploy snapshots
RPO (Recovery Point Objective)1 hourAzure SQL continuous backup + daily .bacpac exports

Backup Strategy

1. Azure SQL Point-in-Time Restore (PITR)

  • Automatic, managed by Azure
  • Retention: 7 days (Basic/Standard tier)
  • Granularity: Any point in time within retention window
  • RPO: ~5 minutes (continuous transaction log backup)

2. Pre-Deploy .bacpac Export

  • Triggered by: CI/CD pipeline before each production deployment
  • Storage: Azure Blob Storage container db-backups in hcssstorageprod
  • Retention: 30 days, auto-cleanup of older backups
  • Script: infra/backup-database.sh prod pre-deploy

3. Scheduled Daily Backup

  • Schedule: Daily at 2:00 AM UTC via GitHub Actions
  • Workflow: .github/workflows/scheduled-backup.yml
  • Storage: Same blob container db-backups
  • Can be triggered manually via GitHub Actions workflow dispatch

Restore Procedures

Option 1: Azure Portal Point-in-Time Restore (Fastest)

  1. Navigate to Azure Portal > SQL Database > EventsCore
  2. Click "Restore" in the toolbar
  3. Select the desired point in time
  4. Choose a new database name (e.g., EventsCore-restored)
  5. Wait for restore to complete
  6. Verify data integrity
  7. Swap connection strings or rename databases

Option 2: Restore from .bacpac

# List available backups
az storage blob list \
--container-name db-backups \
--account-name hcssstorageprod \
--query "[].{Name:name, Modified:properties.lastModified}" \
--output table

# Import .bacpac to a new database
az sql db import \
--admin-user sqladmin \
--admin-password "$SQL_ADMIN_PASSWORD" \
--auth-type SQL \
--name EventsCore-restored \
--server hcss-sql-prod \
--resource-group hcss-rg-prod \
--storage-key "$STORAGE_KEY" \
--storage-key-type StorageAccessKey \
--storage-uri "https://hcssstorageprod.blob.core.windows.net/db-backups/BACKUP_FILE.bacpac"

Option 3: Rollback After Bad Migration

  1. Do NOT attempt to reverse-migrate
  2. Restore database from the pre-deploy backup
  3. Fix the migration code
  4. Redeploy through the standard CI/CD pipeline

Disaster Recovery Scenarios

Database Corruption

  1. Use Azure PITR to restore to the last known good state
  2. Verify data integrity with application health checks
  3. Update connection string if using a new database name

Accidental Data Deletion

  1. Use PITR to restore to just before the deletion
  2. Export the needed data from the restored database
  3. Import the data back into the production database

Complete Region Failure

  1. Azure SQL geo-replication (if enabled) provides automatic failover
  2. Otherwise, restore from .bacpac in a different region
  3. Update DNS/connection strings to point to the new region
  4. Redeploy the application to a new App Service in the target region

Testing

Backup restore procedures should be tested quarterly:

  1. Restore a .bacpac to a test database
  2. Verify the application can connect and function
  3. Document any issues and update this runbook