Skip to main content

Runbook: High Memory Usage

Alert Trigger

  • Memory Pressure alert: Available memory <500MB for 10 minutes
  • Maps to the MemoryHealthCheck thresholds configured in the application

Diagnosis Steps

1. Check Current Memory

curl -s https://hcss-eventscore-api-prod.azurewebsites.net/health/detailed | jq .

az monitor metrics list \
--resource /subscriptions/SUB_ID/resourceGroups/hcss-rg-prod/providers/Microsoft.Web/sites/hcss-eventscore-api-prod \
--metric "MemoryWorkingSet,AverageMemoryWorkingSet" \
--interval PT5M

2. Check Application Insights

  • Review Performance blade for memory trends
  • Look for memory leak patterns (steadily increasing over time)
  • Check if a specific endpoint is consuming excessive memory

3. Check Cache Size

The application uses IMemoryCache with a limit of 1024 entries.

  • High memory could indicate cache entries are storing large objects
  • Review if cache compaction (25% threshold) is working

4. Check Hangfire Jobs

  • Access Hangfire dashboard
  • Check for stuck or long-running jobs that may hold references

Resolution Steps

Immediate Relief: Restart

az webapp restart --name hcss-eventscore-api-prod --resource-group hcss-rg-prod

sleep 60
curl -s https://hcss-eventscore-api-prod.azurewebsites.net/health/detailed | jq .entries.memory

Scale Up the App Service Plan

# Current: B1 (1.75 GB RAM) or S1 (1.75 GB RAM)
# Scale to S2 (3.5 GB RAM) or S3 (7 GB RAM)
az appservice plan update \
--name hcss-asp-prod \
--resource-group hcss-rg-prod \
--sku S2

Scale Out (Add Instances)

az appservice plan update \
--name hcss-asp-prod \
--resource-group hcss-rg-prod \
--number-of-workers 2

Investigate Memory Leak

If memory increases steadily after restart:

  1. Enable Application Insights Profiler for memory snapshots
  2. Check for large file uploads held in memory (should use streaming)
  3. Review recent code changes for:
    • Unbounded collections
    • Missing IDisposable implementations
    • Large response objects not being GC'd
    • SignalR connections accumulating

Verification

  1. /health/detailed reports memory healthy
  2. Memory working set is below threshold
  3. Application is responsive (check response times)
  4. Monitor for 30 minutes to ensure memory is stable (not climbing)