Friday BOT Offline
Incident Report for Chatie
Postmortem

This offline incident was caused by the log out of our Friday BOT. Before we logged it in, 19 hours passed.

It should be easy and fast to fix this problem: just scan the QR Code from WeChat and then the bot can be back to service, Ideally, it should be solved within 15 minutes.

The Problem

There are 3 reasons that caused it to take 19 hours before it gets to be fixed:

  1. We have an alarm system that can send offline warning message from Official Account; However, no one watches that warning and no one takes any actions;
  2. After we noticed the outage of Friday BOT, we take about 1 - 2 hours to contact and get a response from the operator.
  3. Finally, the operator did not take the phone with him, which means we have to wait until the next day because he has to go to the office to take the phone.

The Solution

What we should improve in the future:

  1. Monitor the alarm message and take action ASAP
  2. There should have someone who is on duty and in charge of recovering the bot when necessary so that we can take action in time.
Posted Nov 08, 2020 - 11:10 CST

Resolved
The Friday BOT has been logged in, and this incident has been solved.
Posted Nov 08, 2020 - 10:43 CST
Identified
Our Friday BOT has been logged out since 5:30 pm on Nov 7. We are planning to make it back to service on the morning of Nov 8.
Posted Nov 08, 2020 - 02:28 CST
This incident affected: Friday BOT.