Skip to main content

Alerting Configuration

Overview

Azure Monitor alerts are configured via infra/alerts/setup-alerts.sh. Alerts fire to the hcss-alerts-<env> action group which notifies the ops team via email.

Alert Rules

#AlertConditionWindowSeverityRunbook
1API Error Rate>5% 5xx responses5 minSev1API Down
2Response Timeavg >3s10 minSev2Check Application Insights Performance blade
3Health Check Failure/health/ready non-2005 minSev0API Down
4Memory PressureAvailable memory <500MB10 minSev2High Memory
5Database ErrorsAny failed SQL dependency5 minSev1Database Issues

Alert Response Actions

Sev0 - Health Check Failure

  1. Immediately check if the app is reachable: curl https://hcss-eventscore-api-prod.azurewebsites.net/api/ping
  2. If unreachable, restart: az webapp restart --name hcss-eventscore-api-prod --resource-group hcss-rg-prod
  3. If restart doesn't help, check for bad deployment and rollback
  4. Open Sev0 incident per Incident Response

Sev1 - API Error Rate / Database Errors

  1. Check Application Insights Failures blade for the specific error
  2. For database errors, verify SQL Server is accessible and not throttled
  3. Check Hangfire dashboard for failed jobs
  4. Open Sev1 incident if not self-resolving within 15 minutes

Sev2 - Response Time / Memory Pressure

  1. Check Application Insights Performance blade for slow endpoints
  2. For memory, check if the issue is gradual (leak) or sudden (spike)
  3. Consider scaling up/out if persistent
  4. Monitor for 30 minutes before escalating

Setup / Update

To create or update alert rules:

az login
./infra/alerts/setup-alerts.sh prod

Modifying Thresholds

Edit the thresholds in infra/alerts/setup-alerts.sh and re-run the script. Key values:

  • Error rate percentage: 5 (in the KQL query)
  • Response time (ms): 3000
  • Memory threshold (bytes): 1468006400 (~1.4GB, leaving 500MB free on B1)
  • Health check availability: 100

Action Group Configuration

The action group hcss-alerts-<env> is configured with:

  • Email notification to ops team

To add additional notification channels (SMS, webhook, PagerDuty), update the action group in Azure Portal or modify the setup-alerts.sh script.