Friday, February 19, 2010

As Wordpress goes down, a chance to analyze another postmortem arrises

If you recall, we put together a proposed guideline for postmortem communication in a previous post:

Prerequisites
  1. Admit failure - Hiding downtime is no longer an option (thanks to Twitter)
  2. Sound like a human - Do not use a standard template, do not apologize for "inconveniencing" us.
  3. Have a communication channel - Ideally you've set up a process to handle incidents before the event, and communicated publicly during the event. Customers will need to know where to find your updates.
  4. Above all else, be authentic
Requirements:
  1. Start time and end time of the incident.
  2. Who/what was impacted.
  3. What went wrong, with insight into the root cause analysis process.
  4. What's being done to improve the situation, lessons learned.
Nice-to-have's:
  1. Details on the technologies involved.
  2. Answers to the Five Why's.
  3. Human elements - heroic efforts, unfortunate coincidences, effective teamwork, etc.
  4. What others can learn from this experience.

How did Wordpress do in their
postmortem on 2/19/10?

Prerequisites:
  1. Admit failure: Yes. The very first paragraph makes it clear they screwed up.
  2. Sound like a human: Yes. Extremely personal post.
  3. Have a communication channel: Yes, but not ideal. A combination of their general Twitter account, and the founders blog. Could be improved, but overall OK.
  4. Be authentic: Yes. 110% authentic!
Requirements:
  1. Start/end time: No. Only focused on duration.
  2. Who/what was impacted: Yes. Describes that 10.2 million blogs were affected, for 110 minutes, taking away 5.5 million pageviews.
  3. What went wrong: Yes. Router issues, though investigation is continuing.
  4. Lessons learned: Partial. Mostly a promise to share the results of the investigation.
Nice-to-have's:
  1. Technologies involved: No.
  2. Answers to the Five Why's: No.
  3. Human elements: Yes. "the entire team was on pins and needles trying to get your blogs back as soon as possible"
  4. What others can learn: No.
Conclusion
The intent of the blog was to communicate quickly that they are aware of the severity of the issue and are taking it seriously. The details are lacking, mostly because it was posted so quickly. Still utility of this kind of post is extremely powerful, which makes me wonder if having a pre-postmortem with a simple admittance the issue, with an authentic voice and some detail, is a necessary step in the pre/during/post event communication process.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.