Incident Management

Resolve incidents faster, together.

From initial alert to final post-mortem. Collaborate with your team, leverage AI insights, and automate your incident response workflow.

Start Managing Incidents

AI-Powered Root Cause Analysis

Don't waste time hunting through logs. Our AI instantly analyzes failing monitors to determine probable causes and recommend next actions.

AI Briefinglive - updated 72h 35m ago

Monitor 'Production DB' breached its criteria at 11:38 UTC, resulting in cascading downstream failures for 'Production API'. Both monitors have since recovered as of 11:46 UTC.

Probable Cause

Database connection exhaustion60%

High query latency / timeouts25%

Hardware / resource limits15%

Root Cause

Production DB

ROOT CAUSE

Connection refused. Database server unreachable on port 5432.

Production API

DOWNSTREAM

Failing health checks due to upstream database connection timeouts.

Recommended Next Actions

Review database connection pool metrics and active slow queries.

Investigate database host CPU, Memory, and Disk I/O utilization during the incident window.

Collaborative Resolution

When things break, communication is key. Acknowledge incidents to let the team know you're on it, add timestamped notes as you investigate, and keep everyone on the same page.

Acknowledge incidents to halt escalation chains
Add investigation notes and status updates
Track the complete resolution timeline

Resolution Timeline

System15 Jun 2026, 15:55:17

Service restored. Incident resolved automatically after monitors reported healthy.

Sarah PascalNOTE

15 Jun 2026, 15:26:17

I've identified the cause as database connection exhaustion. Scaling the connection pool and restarting affected API instances now.

Sarah Pascal15 Jun 2026, 15:21:17

Acknowledged the incident. Escalation policies paused.

System (Escalation)15 Jun 2026, 15:19:17

Level 2 Escalation: Text message sent to Sarah Pascal.

System (Escalation)15 Jun 2026, 15:18:17

Level 1 Escalation: Phone call to John Smith failed (No Answer).

System15 Jun 2026, 15:16:17

Incident automatically created by failing monitor "Production DB". Email sent to John Smith.

POST-MORTEM

Copy

.md

PDF

Incident #13022 — Production DB

Production

High Severity

STARTED

15/06/2026, 15:16:17

RESOLVED

15/06/2026, 15:55:17

DURATION

39m 0s

RESOLUTION

Automatic

SUMMARY & IMPACT

A monitoring alert on the "Production DB" monitor opened incident INC-13022 in the Production project on 2026-06-15, which immediately triggered cascading alerts for the "Production API". The confirmed root cause was database connection exhaustion leading to API timeouts. The incident was rated high severity and lasted approximately 39 minutes. Service was restored after scaling the database connection pool and restarting affected API instances.

Automated Post-Mortems

Once an incident is resolved, we automatically generate a comprehensive post-mortem report detailing exactly what happened, when it happened, and how it was fixed.

Auto-generated summary and impact assessment
Detailed minute-by-minute timeline
Export to Markdown or PDF to share with stakeholders

Escalation Policies

Tie incidents directly into your escalation policies to ensure the right person gets paged at the right time.

AI Context

AI analyzes past incidents to find related issues, helping your team identify recurring problems faster.

MTTR Tracking

Automatically track Mean Time to Resolution and other critical metrics to measure your team's incident response health.