Timeline (in UTC):19:10pm: The hosted KB becomes unavailable and service restores without intervention in less than 1 minute.
19:16pm: The hosted KB becomes unavailable and service restores without intervention within 2 minutes.
19:50pm - 20:30pm: Multiple events where the hosted KB becomes unavailable for 1-2 minutes and service restores without intervention. On-call engineer escalates the incident with the backend team.
20:30pm - 22:00pm: The service becomes unavailable and no longer restores. Restarting the servers restores service for short periods of time.
22:15pm: The issue is identified and a temporary fix is deployed. Service resumes as normal
23:30pm: A permanent fix is deployed. During this time, the temporary fix had to be momentarily reverted in order to deploy the new version which caused the service to become unavailable for < 5 mins.