Hacker News new | past | comments | ask | show | jobs | submit login

> But once you know you have good DIMMs, it doesn't look like you need to be quite so paranoid about bit errors.

Assuming that only the one-error-per-year cases were due to random bit flips, and all the multiple-errors-per-year cases were due to bad DIMMs, I came up with about a 1/5 chance of getting a single random bit-flip over a 6 year lifespan. But there also seems to be about a 1/3 chance of having a DIMM randomly go bad after a couple years, which of course without ECC would manifest as random crashes and lost (or maybe corrupted) work.




Seems like running memtest every six months or so would be a good policy.

It would be nice if there were a way to test the memory while a machine is running.


The errors we're talking about here are transient. The memory location itself is still usable, the contents get changed when a cosmic ray hits. After the hit, the corrupted value is held without a problem.

Memtest checks if the memory location has a gross fault which prevents it from storing values correctly.

Doesn't seem like memtest will help.


Good point. Thanks.


I remember someone-somewhere saying that in their experience building GCC was a more thorough memory test than memtest itself...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: