Runbook: Database Issues
Alert Trigger
- Database Errors alert: >0 failed dependency calls to SQL for 5 minutes
- Health check
/health/detailedreports database unhealthy
Diagnosis Steps
1. Check Database Connectivity
curl -s https://hcss-eventscore-api-prod.azurewebsites.net/health/detailed | jq .
az sql server show --name hcss-sql-prod --resource-group hcss-rg-prod --query state
az sql db show --name EventsCore --server hcss-sql-prod --resource-group hcss-rg-prod --query status
2. Check Database Metrics
az monitor metrics list \
--resource /subscriptions/{sub}/resourceGroups/hcss-rg-prod/providers/Microsoft.Sql/servers/hcss-sql-prod/databases/EventsCore \
--metric "dtu_consumption_percent,storage_percent,deadlock" \
--interval PT5M
3. Check Application Insights
- Filter exceptions by
Microsoft.Data.SqlClient - Look for connection timeout or deadlock exceptions
- Check dependency calls duration for SQL
Resolution Steps
Serverless Database Auto-Pause (Cold Start)
Azure SQL Serverless databases auto-pause after inactivity. The first connection after pause can take 30-60 seconds.
# Wake the database by making a connection
az sql db show --name EventsCore --server hcss-sql-prod --resource-group hcss-rg-prod
Connection Pool Exhaustion
If the app is running out of connections:
- Check Application Insights for many open connections
- Restart the app service to reset connection pool:
az webapp restart --name hcss-eventscore-api-prod --resource-group hcss-rg-prod
DTU Limit Exceeded
az sql db update \
--name EventsCore \
--server hcss-sql-prod \
--resource-group hcss-rg-prod \
--service-objective S2
Failed Migration
If a migration caused issues:
- Check the migration history in
__EFMigrationsHistorytable - Do NOT attempt to roll back migrations automatically
- Fix the issue with a new forward migration
- Deploy the fix through the standard CI/CD pipeline
Firewall / Network Issues
az sql server firewall-rule list --server hcss-sql-prod --resource-group hcss-rg-prod
az sql server firewall-rule create \
--server hcss-sql-prod \
--resource-group hcss-rg-prod \
--name AllowAzureServices \
--start-ip-address 0.0.0.0 \
--end-ip-address 0.0.0.0
Verification
/health/detailedreports database healthy- Application Insights dependency calls to SQL are succeeding
- Hangfire dashboard shows jobs executing normally
- Monitor for 15 minutes for stability