I just love how Google datacenters are somehow "the real world". Nice, cool and controlled temperatures, batch ordering from vendors knowing they are shipping to Google, stable/repeatable environments, not much mention about the I/O load, etc.
And then there's
> Ignore Uncorrectable Bit Error Rate (UBER) specs. A meaningless number.
...
> Bad news: SSDs fail at a lower rate than disks, but UBER rate is higher
meaningless, it is.
The real world has wild temperature ranges, wilder temperature _changes_, mechanical variations well above and beyond datacenter use, and possibly wil wild loads (e.g. viruses, antiviruses, all sorts of updates, etc. etc.).
>I just love how Google datacenters are somehow "the real world"
They totally are though, especially to the HN crowd where a lot of us may be putting hardware in data centers.
I agree that this isn't going to give us a clear picture of what to expect out of an SSD in say, a netbook or something. On the other hand, data from a million SSD's reported by one company in a controlled environment is a hell of a control group if you want to go test factors like temperature, etc.
* Increased in dns request failures(likely due to said heat) and bad routing cause internal iGoogle services to request this guy's stuff: https://www.youtube.com/watch?v=aT7mnSstKGs
Thanks to some utterly awful cooling my laptop CPU idles at about 70c, peaking in the 80s. I don't know if it's within spec for an i7 or just dumb luck but its still running just fine.
I did find that <takes a drink> later in his talk he kept <takes a drink> taking a drink every 10 seconds or so <takes a drink>, which ended up being more than a <takes a drink> little <takes a drink> irritating to watch and listen <takes a drink> to.
> I just love how Google datacenters are somehow "the real world". ... It is easier to do stats this way, though.
From the paper's abstract (you did read the abstract, right?) :
"... While there is a large body of work based on experiments with individual flash chips in a controlled lab environment under synthetic workloads, there is a dearth of information on their behavior in the field. This paper provides a large-scale field study covering many millions of drive days, ten different drive models, different flash technologies (MLC, eMLC, SLC) over 6 years of production use in Google’s data centers. We study a wide range of reliability characteristics and come to a number of unexpected conclusions. For example, raw bit error rates (RBER) grow at a much slower rate with wear-out than the exponential rate commonly assumed and, more importantly, they are not predictive of uncorrectable errors or other error modes. The widely used metric UBER (uncorrectable bit error rate) is not a meaningful metric, since we see no correlation between the number of reads and the number of uncorrectable errors. We see no evidence that higher-end SLC drives are more reliable than MLC drives within typical drive lifetimes. Comparing with traditional hard disk drives, flash drives have a significantly lower replacement rate in the field, however, they have a higher rate of uncorrectable errors." [0][1]
I guess it's easier to draw incorrect, snarky conclusions based on inaccurate summaries of papers than it is to take a moment to read a paper's abstract to double-check the work of a tech journalist. shrug :(
Largest single users of drives, sure. When they have tens of millions of them, I'd trust their reliability studies a lot more than Aunt Mike who has maybe two or three.
Google's data centers are definitely not cool (in the temperature sense). That was one of the big reveals about the original disk paper since Google runs things hotter, the drives experienced a much hotter environment but their bit error rates were not hugely affected.
>The real world has wild temperature ranges, wilder temperature _changes_, mechanical variations well above and beyond datacenter use, and possibly wil wild loads (e.g. viruses, antiviruses, all sorts of updates, etc. etc.).
I'm not sure what you consider "the real world" (SSDs in ToughBooks for research expeditions in the Amazon?), but seeing that HN is a startup/pro IT social site, most of us are interested to be running them in data centers...
I see your points but the reality is there is no real data on how an SSD acts and fails outside of torture tests from the manufacturer or review sites.
Data centre data is still good data to help us better understand SSD lifetime and failures.
I think it's very informative, and over time we'll see if this is representative of the normal world. Maybe temperature changes are not that important, maybe they are.
At the same time, they have Chrome OS. Most of those laptops have an SSD. All are connected to the cloud. How do they perform? When they have a problem, is it recorded by Google? I'm not 100% clear if I would want that, but it doesn't seem such a bad idea for a computer that is already 100% in the cloud.
The Chrome OS number would be less useful. A hard drive failure that totally destroys the ability to report the result back to Google is indistinguishable from a device that simply never turned on again, which over the course of years, I'd expect to dominate major drive failures.
In terms of the things that are going to make SSDs — or pretty much any other piece of hardware you can imagine — fall over, Google's environment is realer than anything you could possibly conceive.
I had an SSD completely fail after only a month of use. It was a cheap-ish KingSpec C3000 128GB. It wasn't recognized in BIOS. Surprisingly, doing something called 'power cycling' made the disk work again. It still works after 2 years of use.
http://forum.crucial.com/t5/Crucial-SSDs/Why-did-my-SSD-quot...
> "A sudden power loss is a common cause for a system to fail to recognize an SSD. In most cases, your SSD can be returned to normal operating condition by completing a power cycle, a process that will take approximately one hour.
We recommend you perform this procedure on a desktop computer because it allows you to only connect the SATA power connection, which improves the odds of the power cycle being successful. However, a USB enclosure with an external power source will also work. Apple and Windows desktop users follow the same steps.
1. Once you have the drive connected and sitting idle, simply power on the computer and wait for 20 minutes. We recommend that you don't use the computer during this process.
2. Power the computer down and disconnect the drive from the power connector for 30 seconds.
3. Reconnect the drive, and repeat steps 1 and 2 one more time.
4. Reconnect the drive normally, and boot the computer to your operating system.
"
A $15 USB3-SATA external interface will do this just as well, and you don't even have to plug it into a computer. As a bonus, it's a good thing to have lying around a workshop where you might want to temporarily plug in a drive to investigate it.
(Edit: wow, somehow I missed the line you had about that in your post. My apologies.)
NewEgg has a bunch of nonames for $7-12 which appear to be identical to the ones in my lab, which also have no identifying marks. Separate AC-DC power supply brick going to a switch and a 4-pin Molex, separate Molex-SATA power converter, and a small black box that takes USB on one side and has a SATA port and 3.5/2.5 PATA on the edges.
While some of those Crucial drives are Micron and pretty reasonable, please beware of comparing "consumer drives" and drives from 2012 / 2013 with today's drives.
The real problem with drives is their firmware has to garbage collect. So, yes, you can push a drive into a corner where it can't escape. Not to name names, but I've had this happen in my company's testing of different drives. That also means there are peculiar results, such as needing to restart the drive, or let it sit for an hour, and seem much better.
Our experience with MLC ( managing and observing many customers' Flash deployments ) has been very positive, and when using major manufacturers' drives works very well.
This article is light on details, and looking at the source it would be great to know brands and model numbers.
I use SSDs in all my builds, servers and workstations. My most used are Samsung 850 EVOs and PROs, follow by the Intel 750 and Samsung 950 PRO.
Out of bout 80 or so that I've put into production over the past few years I've had about 3 850 EVOs go bad on me, just completely lock up the machine they are connected to, can't even run diag. I make sure to use the PRO series in critical environments, and EVO for budget.
The actual paper has much more information than the fluffy article that is linked here.
From the paper: "The drives in our study are custom designed high performance
solid state drives, which are based on commodity
flash chips, but use a custom PCIe interface, firmware
and driver."
These aren't drives that you can just go out and buy, so brands and model numbers would be meaningless to anyone outside of Google.
> I've had about 3 850 EVOs go bad on me, just completely lock up the machine they are connected to, can't even run diag
That's worrying. It should at least stay a proper PCI-E citizen, not lock up the computer. And it should always provide the SMART data, even if the SSD is dead otherwise. And even if writing isn't possible anymore (safely, due to many bad bits), it should lock-down to read-only.
I was using OCZ Vertex 2 SSD in the past, when (possibly due to a firmware bug) it just wouldn't appear on the SATA bus anymore and had an error LED lit up.
I RMA-d the drive, and they said they have to reflash the firmware, but doing so would also loose all data because it would erase the AES key. (I never configured any encryption on the drive, but apparently it does use it by default).
Needless to say I didn't buy OCZ again, but I'm not sure if this is a general problem with SSDs or just Sandforce controllers.
I think the EVO 850 are all SATA, not PCIe. But in general, yes, they should at least appear on the bus...
Interestingly, when this happenned to me, with some "ADATA" SSDs they would still negotiate a link speed, so their PHYs did get initialized. But Linux didn't get any further information from the disks, device type, name capacity... So maybe their firmware crashed halfway through initializing the SSD.
I only have a few in production. I'm using a 512GB 950 on my gaming PC (has plenty of cooling, so can't comment on the heat, the Intel 750s have a heatsink though...) , it's replacing an 850, in practice it's hard to notice the difference between them. In a virtual environment when you have multiple VMs running on the same SSD, it has noticeable gains, higher IOPS and transfer speed make a big difference.
Most things I encounter in personal use are either CPU or GPU bound. There is nothing worse than having an application crawl to a halt when it is not even using a fraction of your system resources. Single threaded and 32 bit only applications are the bane of my existence.
With their scale they might be ordering custom SSDs.
Edit: yep, right in the paper: "The drives in our study are custom designed high performance solid state drives, which are based on commodity flash chips, but use a custom PCIe interface, firmware and driver."
It's fun to see this data come out. I did some of the early reliability tests for these devices. At the time, Google hadn't publicly announced that we were using SSDs in the data-center so there wasn't really an opportunity to publish or talk about it.
The most surprising thing to me was the correlation between age and RBER.
I would not have guessed that normal aging would play a significant role in reliability. It would be fun to understand what is happening there.
In fact SSDs are not designed to retain data over very long periods while powered down.... particilarly when exposed to higher temperatures, and if they have seen lots of writes.
The interesting note from this abstract is I guess they saw that number of writes didn't seem to play as much of an effect. I don't known if they tested temperature as an independent variable.
I suspect higher temps played a role in influencing Google's results. Higher temps while powered on will increase write endurance. If they were ever powering off drives for efficiency purposes this would also have an effect on read errors, which also gets worse with higher temperature.
I'll quote heavily from AnandTech's article on the topic;
"As always, there is a technical explanation to the data retention scaling. The conductivity of a semiconductor scales with temperature, which is bad news for NAND because when it's unpowered the electrons are not supposed to move as that would change the charge of the cell. In other words, as the temperature increases, the electrons escape the floating gate faster that ultimately changes the voltage state of the cell and renders data unreadable (i.e. the drive no longer retains data).
For active use the temperature has the opposite effect. Because higher temperature makes the silicon more conductive, the flow of current is higher during program/erase operation and causes less stress on the tunnel oxide, improving the endurance of the cell because endurance is practically limited by tunnel oxide's ability to hold the electrons inside the floating gate."
I think you're right, but what they are saying is that it is irrelevant because the drives/blocks are failing due to age, and not amount of write cycle.
Something I was unaware of: single bit (correctable) read errors are not immediately repaired in NAND. Subsequent in-error reads are repeatedly processed by ECC on the fly, while the media rewrite is scheduled for sometime later.
This inflates the observed correctable error rate to some extent.
FWIW - single-bit errors are not synonymous with correctable errors in flash.
For any given type of flash chip the manufacturer will provide a spec like "you must be able to correct N bits per M bytes". The firmware for a flash drive must use forward error correction codes (e.g. BCH or LDPC) of sufficient strength to correct the specified number of bit errors.
Dealing with a certain amount of bit-errors is just part of dealing with flash.
For example, a chip could have a spec that you must be able to correct up to 8 bit errors per 512 bytes (made up numbers). If the chip had 4KiB pages, each page would be split into 8 "chunks" that were each protected by error correcting codes that were capable of correcting up to 8 single-bit errors in that chunk. As long as no "chunk" had more than 8 bit errors, the read would succeed.
So in this case you could theoretically have a page read with 64 bit errors that succeeded.
This is alluded to in the paper: "The first generation of drives report accurate counts for the number of bits read, but for each page, consisting of 16 data chunks, only report the number of corrupted bits in the data chunk that had the largest number of corrupted bits."
The article talks about ignoring UBER _specs_ the paper doesn't say that (at least that's not what I read).
From the paper:
We find that UBER (uncorrectable bit error rate), the
standard metric to measure uncorrectable errors, is not
very meaningful. We see no correlation between UEs
and number of reads, so normalizing uncorrectable errors
by the number of bits read will artificially inflate the
reported error rate for drives with low read count."
Basically uncorrectable errors are not correlated to the number of reads, so computing an UBER by dividing the number of uncorrectable errors by the number of reads is meaningless. Therefore UBER is a meaningless metric.
They don't differentiate UBER from a manufacturer spec and UBER as measured in production.
I believe the high firmware failures rates only effected a relatively small number of models.
Hopefully firmware will get to a point where if the SSD does fail it will be far more graceful (revert to read-only), rather than the sudden death I've seen first hand.
The drives in our study are custom designed high performance
solid state drives, which are based on commodity
flash chips, but use a custom PCIe interface, firmware
and driver.
The actual paper doesn't describe the drives as enterprise or consumer. Indeed the word "consumer" appears nowhere in the paper. The charitable interpretation of the FA is that its author was making the simplifying analogy of SLC : enterprise :: MLC : consumer.
No, almost every NAND chip will have bad blocks. The initial ones are discarded during drive initialization in the factory. Then they ideally do some burn-in which may catch a few more bad blocks. Then over the life of the drive you may see thousands of defective blocks depending of capacity and number of NAND dies. However the drive can be designed and tested to tolerate to a expected defect limit with no data loss. Now if there is a bad chip, that can also be handled but the recovery process can be high stress on the SSD like in RAID recovery.
More numbers with specific models would be nice. It would be perhaps useful to do a cost-benefit analysis over time of better SSDs vs. cheap SSDs with more redundancy and backups.
Much like a man-hour is a measure of one human working for one hour, a drive day is one drive operating for one day. So a million drive days could be a million drives for one day, a thousand drives for a thousand days, or any combination.
Power failures result in corruption. (To be fair, true of just about every SSD except the Intel DC line. But I have personally experienced this with the 850 Pro.)
Some higher end (e.g. enterprise) SSDs have some form of power backup (e.g. capacitors) to protect against power failure induced corruption. SSDs with power backup tend to be more expensive though than the consumer grade devices (such as the 850 Pro).
I'm sure there are others, but I don't know about them. Having surveyed the market myself recently, there's no consumer-grade SSD with such a feature (but the Intel ones aren't too pricey).
And then there's
> Ignore Uncorrectable Bit Error Rate (UBER) specs. A meaningless number. ...
> Bad news: SSDs fail at a lower rate than disks, but UBER rate is higher
meaningless, it is.
The real world has wild temperature ranges, wilder temperature _changes_, mechanical variations well above and beyond datacenter use, and possibly wil wild loads (e.g. viruses, antiviruses, all sorts of updates, etc. etc.).
It is easier to do stats this way, though.