It's because of the way most companies build their status dashboards. There are usually at least 2 dashboards, one internal dashboard and one external dashboard. The internal dashboard is the actual monitoring dashboard, where it will be hooked up with other monitoring data sources. The external status dashboard is just for customer communication. Only after the outage/degradation is confirmed internally, then the external dashboard will be updated to avoid flaky monitors and alerts. It will also affect SLAs so it needs multiple levels of approval to change the status, that's why there are some delays.
> The external status dashboard is just for customer communication. Only after the outage/degradation is confirmed internally, then the external dashboard will be updated to avoid flaky monitors and alerts. It will also affect SLAs so it needs multiple levels of approval to change the status, that's why there are some delays.
This defeats the purpose of a status dashboard and is effectively useless in practice most of the time from a consumers point of view.
From a business perspective, I think given the choice to lie a little bit or be brutally honest with your customers, lying a bit is almost always the correct choice.
My ideal would be if regulations which made it necessary that downtime metrics had to be reported with at most somewhere between a 10m and 30m delay as "suspected reliability issue".
If your reliability metrics have lots of false positives, that's on you and you'll have to write down some reason why those false positives exist every time.
Then that company could decide for itself whether to update manually with "not a reliability issue because X".
This lets consumers avoid being gaslighted and businesses don't technically have to call it downtime.
This is intentional. It's mostly a matter of discussing how to communicate it publicly and when to flip the switch to start the SLA timer. Also coordinating incident response during a huge outage is always challenging.
Now 4 out of 10 services are marked as "Incident", yet most of the others are also completely dead.