Skip to main content

Runbook: Database Issues

Alert Trigger

  • Database Errors alert: >0 failed dependency calls to SQL for 5 minutes
  • Health check /health/detailed reports database unhealthy

Diagnosis Steps

1. Check Database Connectivity

curl -s https://hcss-eventscore-api-prod.azurewebsites.net/health/detailed | jq .
az sql server show --name hcss-sql-prod --resource-group hcss-rg-prod --query state
az sql db show --name EventsCore --server hcss-sql-prod --resource-group hcss-rg-prod --query status

2. Check Database Metrics

az monitor metrics list \
--resource /subscriptions/{sub}/resourceGroups/hcss-rg-prod/providers/Microsoft.Sql/servers/hcss-sql-prod/databases/EventsCore \
--metric "dtu_consumption_percent,storage_percent,deadlock" \
--interval PT5M

3. Check Application Insights

  • Filter exceptions by Microsoft.Data.SqlClient
  • Look for connection timeout or deadlock exceptions
  • Check dependency calls duration for SQL

Resolution Steps

Serverless Database Auto-Pause (Cold Start)

Azure SQL Serverless databases auto-pause after inactivity. The first connection after pause can take 30-60 seconds.

# Wake the database by making a connection
az sql db show --name EventsCore --server hcss-sql-prod --resource-group hcss-rg-prod

Connection Pool Exhaustion

If the app is running out of connections:

  1. Check Application Insights for many open connections
  2. Restart the app service to reset connection pool:
    az webapp restart --name hcss-eventscore-api-prod --resource-group hcss-rg-prod

DTU Limit Exceeded

az sql db update \
--name EventsCore \
--server hcss-sql-prod \
--resource-group hcss-rg-prod \
--service-objective S2

Failed Migration

If a migration caused issues:

  1. Check the migration history in __EFMigrationsHistory table
  2. Do NOT attempt to roll back migrations automatically
  3. Fix the issue with a new forward migration
  4. Deploy the fix through the standard CI/CD pipeline

Firewall / Network Issues

az sql server firewall-rule list --server hcss-sql-prod --resource-group hcss-rg-prod

az sql server firewall-rule create \
--server hcss-sql-prod \
--resource-group hcss-rg-prod \
--name AllowAzureServices \
--start-ip-address 0.0.0.0 \
--end-ip-address 0.0.0.0

Verification

  1. /health/detailed reports database healthy
  2. Application Insights dependency calls to SQL are succeeding
  3. Hangfire dashboard shows jobs executing normally
  4. Monitor for 15 minutes for stability