DORA Metrics Troubleshooting Guide

Common issues and solutions when implementing DORA metrics with DevGrid.

Table of Contents


Event Submission Issues

Events Not Appearing in DevGrid

Symptoms:

  • You send events via API, but they don't appear in DevGrid
  • API returns 201 but events aren't visible

Possible Causes & Solutions:

1. Check the API Response

Always examine both success and failures arrays in the response:

curl -X POST https://prod.api.devgrid.io/events \
  -H "x-api-key: $DEVGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"events": [...]}' \
  -v

Look for:

  • HTTP 201 status code (success)
  • "failures": [] (empty failures array)
  • "success": [...] (events in success array)

2. Invalid Entity ID

Error:

{
  "success": [],
  "failures": [
    {
      "event": {...},
      "reason": "Entity not found"
    }
  ]
}

Solution:

  • Verify entityId exists in DevGrid
  • Check for typos in entity ID
  • Ensure you're using the correct entity ID (UUID or shortId)
  • Create the application entity first if it doesn't exist

3. Wrong Event Type

Error:

{
  "failures": [
    {
      "reason": "Invalid event type"
    }
  ]
}

Solution:

  • Use exact type: "event-deploy" (not "deployment", "deploy", or "Deployment")
  • Event types are case-sensitive
  • See API Reference for valid event types

4. Invalid Timestamp Format

Error:

{
  "failures": [
    {
      "reason": "Invalid timestamp"
    }
  ]
}

Solution:

  • Use ISO 8601 format: "2025-10-29T18:30:00Z"
  • Include timezone (Z for UTC)
  • Don't use timestamps in the future
  • Examples of valid timestamps:
    • "2025-10-29T18:30:00Z"
    • "2025-10-29T18:30:00.000Z"
    • "2025-10-29 18:30:00" ❌ (missing T and Z)
    • "10/29/2025" ❌ (wrong format)

Authentication Errors

Symptoms:

  • 401 Unauthorized response
  • "message": "Unauthorized"

Solutions:

1. Missing API Key Header

# ❌ Wrong - no header
curl -X POST https://prod.api.devgrid.io/events -d '{...}'

# ✅ Correct
curl -X POST https://prod.api.devgrid.io/events \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{...}'

2. Wrong Header Name

# ❌ Wrong header names
-H "Authorization: Bearer YOUR_API_KEY"  # Wrong
-H "API-Key: YOUR_API_KEY"               # Wrong
-H "apikey: YOUR_API_KEY"                # Wrong

# ✅ Correct header name
-H "x-api-key: YOUR_API_KEY"

3. Invalid API Key

Solutions:

  • Verify API key hasn't expired
  • Check for extra spaces or newlines in key
  • Generate a new API key from DevGrid Settings
  • Ensure you're using the correct environment's key (dev vs prod)

4. Wrong API Endpoint

# ❌ Wrong endpoints
https://devgrid.io/events
https://api.devgrid.io/api/events
https://prod.devgrid.io/events

# ✅ Correct endpoints
https://api.dev.devgrid.io/events       # Development
https://prod.api.devgrid.io/events      # Production

Payload Too Large Error

Symptoms:

  • 413 Payload Too Large response

Solution:

  • Maximum 15 events per request
  • Split large batches into multiple requests:
# ❌ Too many events
{
  "events": [ /* 20 events */ ]
}

# ✅ Split into batches
# Request 1:
{
  "events": [ /* 15 events */ ]
}

# Request 2:
{
  "events": [ /* 5 events */ ]
}

Deployment Frequency Issues

Deployment Frequency Shows Zero

Possible Causes:

1. No Production Deployments Recorded

Check:

  • Are you sending deployment events with "env": "production"?
  • Is the environment name exactly "production" (case-sensitive)?

Solution:

{
  "events": [{
    "type": "event-deploy",
    "attributes": {
      "env": "production",  // Must be exact match
      "status": "success"
    }
  }]
}

2. Wrong Time Window

Check:

  • Deployment Frequency is calculated within selected date range
  • Are there deployments in your selected time window?

Solution:

  • Expand date range in DevGrid dashboard
  • Verify events were sent with correct timestamps

3. Only Failed Deployments

Issue:

  • Only deployments with "status": "success" may count (check DORA Metrics documentation)

Solution:

  • Ensure successful deployments are being sent

Deployment Frequency Seems Too Low

Possible Causes:

1. Environment Name Inconsistency

Problem:

// Different environment names for the same environment
{"attributes": {"env": "prod"}}
{"attributes": {"env": "production"}}
{"attributes": {"env": "PRODUCTION"}}
{"attributes": {"env": "prd"}}

Solution:

  • Standardize on ONE environment name across all systems
  • Recommended: "production" (lowercase)
  • Update all CI/CD pipelines to use consistent naming

2. Missing entityId

Problem:

  • Events sent without entityId don't associate with your application

Solution:

{
  "events": [{
    "type": "event-deploy",
    "entityId": "your-app-id",  // Always include this
    "attributes": {...}
  }]
}

3. Wrong Entity

Problem:

  • Events sent to different entity than you're viewing

Solution:

  • Verify entityId matches the application you're viewing in DevGrid
  • Check all CI/CD pipelines use correct entity ID

Lead Time for Change Issues

Lead Time Shows Zero or N/A

Possible Causes:

1. Missing build_commit_sha

This is the #1 cause of Lead Time issues.

Problem:

{
  "attributes": {
    "env": "production",
    "status": "success"
    // Missing: "build_commit_sha"
  }
}

Solution:

{
  "attributes": {
    "env": "production",
    "status": "success",
    "build_commit_sha": "a1b2c3d4e5f6789"  // REQUIRED for Lead Time
  }
}

Platform-specific ways to get commit SHA:

  • GitHub Actions: ${{ github.sha }}
  • GitLab CI: ${CI_COMMIT_SHA}
  • Jenkins: ${GIT_COMMIT}
  • CircleCI: ${CIRCLE_SHA1}
  • Azure DevOps: $(Build.SourceVersion)

2. Version Control Not Connected

Problem:

  • DevGrid needs PR and commit data from GitHub/GitLab/Bitbucket
  • This data must be synced before Lead Time can be calculated

Solution:

  • Go to DevGrid Settings → Integrations
  • Connect your GitHub/GitLab/Bitbucket account
  • Verify repositories are syncing
  • Wait for initial sync to complete (can take several minutes)

To verify sync:

  • Go to your application in DevGrid
  • Check the "Commits" or "Pull Requests" tab
  • Confirm recent commits appear

3. Commit SHA Doesn't Match

Problem:

  • build_commit_sha in deployment event doesn't match any commit in your repository

Causes:

  • Typo in commit SHA
  • Deploying from a fork or different repository
  • Commit was made in a repository not connected to DevGrid

Solution:

  • Verify commit SHA is correct: git log --oneline
  • Ensure you're deploying from the connected repository
  • Check that the commit exists in the repository DevGrid is monitoring

4. No Pull Requests Associated

Problem:

  • Commits exist but aren't associated with merged PRs
  • Lead Time requires PR data to calculate time from first commit to deployment

Solution:

  • Use Pull Requests for all production deployments
  • Merge PRs before deploying (don't deploy from feature branches directly)
  • If using squash commits, ensure the squashed commit is what's deployed

5. Squash Commits vs Merge Commits

Issue:

  • Squash commits create a new commit SHA that may not match original PR commits

Solution:

  • Deploy using the merge commit SHA (after PR is merged)
  • In CI/CD, use the commit SHA from the merged branch, not the PR branch
  • For GitHub: Deploy from main branch, not from PR branch

Example GitHub Actions:

on:
  push:
    branches: [main]  # Deploy after merge, not on PR

# This gives you the merge commit SHA:
build_commit_sha: "${{ github.sha }}"

Lead Time Seems Incorrect

Possible Causes:

1. Wrong Timestamp

Problem:

  • Using pipeline start time instead of actual deployment time

Solution:

{
  "timestamp": "2025-10-29T18:30:00Z",  // Actual deployment completion time
  "attributes": {
    "deployment_start_time": "2025-10-29T18:25:00Z"  // When it started
  }
}

2. Long-Running Feature Branches

Issue:

  • Lead Time includes all commits since previous deployment
  • Long-lived branches can inflate Lead Time

This is expected behavior - it accurately reflects how long code sat before being deployed.

To improve:

  • Deploy more frequently
  • Use shorter-lived feature branches
  • Deploy each PR individually rather than batching multiple PRs

Change Failure Rate Issues

Change Failure Rate Shows Zero or N/A

Possible Causes:

1. Missing change_id in Deployments

This is the #1 cause of CFR issues.

Problem:

{
  "attributes": {
    "env": "production",
    "status": "success"
    // Missing: "change_id"
  }
}

Solution:

{
  "attributes": {
    "env": "production",
    "status": "success",
    "change_id": "CHG0012345",  // REQUIRED for CFR
    "change_url": "https://servicenow.example.com/change/CHG0012345"
  }
}

2. Incident Management System Not Connected

Problem:

  • DevGrid needs incident data from ServiceNow or other incident management system
  • No incidents = no failures = CFR can't be calculated

Solution:

  • Connect ServiceNow or your incident management system
  • Go to DevGrid Settings → Integrations → Incident Management
  • See ServiceNow Integration Guide

3. Incidents Not Linked to Deployments

Problem:

  • Incidents exist but aren't linked to the deployment change_id
  • Without this link, DevGrid can't determine which deployment caused the incident

Solution in ServiceNow:

  • Add custom field to incident: "Caused By Deployment" or "Related Change"
  • Populate this field with the deployment change_id (e.g., "CHG0012345")
  • DevGrid uses this linkage to calculate CFR

Example incident record:

Incident: INC0012345
Priority: P1
Created: 2025-10-29T19:00:00Z
Resolved: 2025-10-29T20:30:00Z
Caused By: CHG0012345  ← This links incident to deployment

4. All Incidents Still Open

Problem:

  • CFR only counts resolved incidents
  • Open incidents don't contribute to CFR calculation

Solution:

  • Resolve incidents in your incident management system
  • Metrics update daily at UTC midnight

Change Failure Rate Seems Too High

Possible Causes:

1. Incidents Incorrectly Linked

Problem:

  • Incidents linked to deployments that didn't actually cause them
  • All incidents get linked to "nearest deployment" automatically

Solution:

  • Review incident-to-deployment linkages
  • Remove incorrect linkages
  • Only link incidents definitively caused by a deployment

2. Including Non-Production Incidents

Problem:

  • Staging or dev incidents linked to production deployments

Solution:

  • Only link production incidents to production deployments
  • Filter incidents by environment in your incident system

Mean Time to Restore Issues

MTTR Shows Zero or N/A

Possible Causes:

1. No Resolved Incidents

Problem:

  • MTTR only calculated from resolved incidents
  • Open incidents don't count

Solution:

  • Resolve incidents in your incident management system
  • Wait for daily metric calculation (UTC midnight)

2. Incident Management System Not Connected

Solution:

3. Missing Timestamps

Problem:

  • Incidents missing Created or Resolved timestamps

Solution:

  • Verify incidents have both:
    • Detection time (when created/opened)
    • Resolution time (when closed/resolved)

MTTR Seems Incorrect

Possible Causes:

1. Wrong Priority Filter

Problem:

  • Viewing MTTR for wrong incident priority

Solution:

  • Check which priority level you're viewing (P1, P2, All Priorities)
  • P1 MTTR will differ significantly from P3 MTTR

2. Incident State Mapping Issues

Problem:

  • Incident states not correctly mapped to "resolved"

Solution:

  • Verify ServiceNow state mapping:
    • "Resolved" → Counts as resolved ✅
    • "Closed" → Counts as resolved ✅
    • "In Progress" → Does NOT count ❌
    • "Pending" → Does NOT count ❌

3. Including Non-Production Incidents

Problem:

  • Dev or staging incidents inflating MTTR

Solution:

  • Filter incidents to production-only
  • Check incident-to-application mappings

Integration Issues

GitHub/GitLab/Bitbucket Integration Not Working

Symptoms:

  • No commits or PRs appearing in DevGrid
  • Lead Time shows N/A

Solutions:

1. Check Integration Status

  • Go to DevGrid Settings → Integrations
  • Verify connection status is "Connected"
  • Check last sync time

2. Re-authenticate

  • Remove integration
  • Re-add integration with fresh OAuth token
  • Grant all required permissions

3. Check Repository Access

  • Verify DevGrid has access to your repositories
  • For organization repos, admin may need to approve

4. Verify Repository is Selected

  • Not all repositories may be syncing by default
  • Check repository selection in integration settings

ServiceNow Integration Not Working

Symptoms:

  • No incidents appearing in DevGrid
  • MTTR and CFR show N/A

Solutions:

See dedicated ServiceNow Integration Guide for detailed troubleshooting.

Quick checks:

  • Verify API credentials are correct
  • Check ServiceNow instance URL
  • Verify custom field configuration for deployment linkage
  • Check incident query filters (only production incidents?)

Data Quality Issues

Inconsistent Environment Names

Problem:

// These are all treated as DIFFERENT environments:
{"env": "production"}
{"env": "prod"}
{"env": "Production"}
{"env": "PROD"}
{"env": "prd"}

Solution:

  1. Standardize on one name: Choose "production" (lowercase)
  2. Update all CI/CD pipelines to use exact same name
  3. Document the standard in your team wiki
  4. Backfill if needed: Re-send historical events with correct environment names

Missing Critical Fields

Checklist of required fields for each metric:

MetricRequired FieldWhere to Include
Deployment FrequencyenvDeployment events
Lead Time for Changebuild_commit_shaDeployment events
Change Failure Ratechange_idDeployment events
Change Failure RateIncident-deployment linkServiceNow/incident system
MTTRIncident timestampsServiceNow/incident system

Historical Data Import Issues

Problem:

  • Backfilling large amounts of historical data

Solutions:

1. Batch Events

# Send up to 15 events per request
for batch in deployment_batches:
  curl -X POST https://prod.api.devgrid.io/events \
    -H "x-api-key: $DEVGRID_API_KEY" \
    -d "{\"events\": [$batch]}"
  sleep 1  # Rate limiting
done

2. Use Correct Timestamps

{
  "timestamp": "2025-09-15T10:00:00Z",  // Historical date, not current time
  "attributes": {
    "env": "production",
    "status": "success"
  }
}

3. Verify After Import

  • Check DevGrid dashboard for historical date ranges
  • Verify metrics calculate correctly
  • Spot-check random dates

Getting Help

Before Contacting Support

Gather this information:

  1. API Request:

    • Full curl command (redact API key)
    • Request body (JSON)
  2. API Response:

    • HTTP status code
    • Response body
    • Any error messages
  3. Environment Details:

    • DevGrid environment (dev/prod)
    • Entity ID
    • Deployment timestamp
    • CI/CD platform
  4. What You've Tried:

    • Troubleshooting steps from this guide
    • Any error messages or logs

Contact Support