Obligatory link to Jim Gray's "Why Do Computers Stop and What Can Be Done About ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

jordan0day on Aug 24, 2015 | parent | context | favorite | on: The Power of Power Cycling

Obligatory link to Jim Gray's "Why Do Computers Stop and What Can Be Done About It?": http://www.hpl.hp.com/techreports/tandem/TR-85.7.pdf

TL;DR: Most software bugs that make it past testing are transient "heisenbugs". That is, they're the kind of bug that goes away when if you restart the program.

Related: This is actually a core tenet of the Erlang ecosystem -- spend any length of time around Erlangers and you're bound to hear the phrase "let it crash". Erlang actually has support for this built into the system: Supervisor processes exist to automatically "power cycle" your code if an unhandled error occurs.

sllabres on Aug 24, 2015 [–]

It is not necessary that there is a bug firsthand. Think at a system with memory pressure due to memory fragmentation. This could lead to failed memory requests for applications that would succeed on a less long running system. (For this reason some systems even disallow dynamic memory allocations during runtime)

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact