A postmortem is a structured review of an incident to identify root causes, learn from mistakes, and improve processes. This template ensures consistency in capturing key details, diagnostics, and follow-ups for continuous improvement.
-
Gather Key Stakeholders
- SRE Lead
- Developer Lead
- Incident Manager
- Any other relevant team members
-
Prepare Data and Timeline
- Collect logs, metrics, and alerts related to the incident.
- Compile a timeline of key events during the incident.
-
Conduct the Postmortem Meeting
- Walk through the incident timeline.
- Discuss root causes and contributing factors.
- Identify lessons learned and areas for improvement.
- Use the "Open Questions" section to clarify unresolved aspects.
-
Document the Postmortem
- Use the
postmortem-template.mdto record findings. - Assign follow-up action items with owners and deadlines.
- Use the
-
Review and Publish
- Share the postmortem report with relevant teams and stakeholders.
- Update any public status pages if necessary.
-
Track Follow-ups
- Ensure action items are completed within the due dates.
- Incorporate feedback from retrospective reviews into future incident management processes.
- Copy the
postmortem-template.mdinto a new file (e.g.,postmortem-incident-YYYY-MM-DD.md). - Fill out each section based on the postmortem meeting discussions.
- Submit the report to the relevant repository for documentation and tracking.