Hacker News new | past | comments | ask | show | jobs | submit login

I wonder why the status just doesn't ping github.com for 200. That seems easy to do.



To be fair - I really couldn't care less is the homepage is loading or not.

So long as I can fetch/commit to my repos, pretty much everything else is of secondary, tertiary, or no real importance to me.

(At work, I do indeed have systems running that monitor 200 statuses from client project homepages, almost all of which show better that 99.999% uptimes. And are practically useless. Most of them also monitor "canary" API requests which I strive to keep at 99.99% but don't always manage to achieve 99.9% - which is the very best and most expensive SLA we'll commit to.)


from where? they don't only have one load balancer, so you'd still have the problem of the page showing green when it's not loading for some folk?


At Github's scale, why wouldn't they put a ping monitor from every continent at least?

Then, you would show the status based on the continent.


Where on the continent? GitHub is undoubtedly doing blackbox testing internally and has multiple such monitors but that's not going to capture every customer's route to them, leading to the same problem - customers experience GitHub being down, despite monitoring saying it's mostly up. Thus the impass. Even doing whitebox testing, where you know the internals and can this place sensors intelligently, even just for ingress, you're still at the mercy of the Internet.

If a sensor that's basically in the same datacenter says you're up, but the route into the datacenter is down, then what? multiply this by the complexity of the whole site, and monitoring it all with 100% fidelity is impossible. Not that it's not worth it to try, there's a team at GitHub that works on monitoring, but beyond motivation about keeping the SLA up, as a customer, unless you notice it's down, is it really down? In a globally distributed system, downtime, except for catastrophic downtime like this, is hard to define on a whole-site basis for all customers.


> 100% fidelity is impossible

I don't think anybody asked for 100% fidelity. We are talking about a complete outage that affected at least North America and Europe. If the status page shows green in such a case, its fidelity is around 50%. People expect better from GitHub.


The amount of moaning that the status page wasn't updated in 0 seconds and had the wrong status for entire minutes is what leads me to believe that no, users do expect 100a% fidelity.

Total outages are rare enough, and there's enough other work, that spending time building a system for that, just doesn't seem like the best use of their time. though I'm biased, having faced that exact question from the inside, at different company.


> monitoring it all with 100% fidelity is impossible

This is impossible regardless of how godlike the design is... Nobody is asking for 100% fidelity.


That would be self-defeating given that it's a Rails app.


delaying SLA


This is at least a multi-million dollar payout (if they admit to it).

All GitHub Pages say

> We're having a really bad day.

> The Unicorns have taken over. We're doing our best to get them under control and get GitHub back up and running.


At the moment, all github services seem to be restored, and the github status indicates that the problem is still ongoing. I don't think it's related to the SLA, but rather to the monitoring, which is not live. There are a few minutes of delay.


Seems slightly unproffesional for a massive company like Github/Microsoft.


I disagree. This hurts no one, and not everything needs to be sanitized and painted over with bland corporatespeak.


I don't think they were asking for corporate speak. But at least I would find a plain technical error message like "cannot contact file server" much more respectable than something like "unicorns are hugging our servers uwu".


This “ironic” and “humorous” style of errors and UI captions is the actual new corporate speak. I’d prefer dumb error messages rather than some shit someone over the ocean thinks is smart and humorous. And it’s not funny at all when it’s a global outage impacting my business and my $$$.


It's closer to the truth than you usually get. They're having a bad day, it's completely true. It's the start of my day, but I guess this is the middle of the night for them. There's no such thing as unicorns, but that just highlights the metaphorical nature of the remaining claim - getting Unicorns under control means solving their problems. Normally "professional" corporate speak means avoiding saying anything whose meaning is plain on its face and disconfirmable while avoiding the implication that the company is run and operated by humans. This is a model. (Obviously the came up with the message in advance, which just goes to show that someone in the company is well enough rounded to know that if it is displayed, they're having a bad day.)


GitHub is (was?) a Rails application, so it was probably originally running behind Unicorn [0], if it isn’t still. So the unicorns are (were) real.

[0] https://en.wikipedia.org/wiki/Unicorn_(web_server)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: