The SSD Garbage Collection Problem Explained In Depth by WhipTail CTO James Candelaria – Part 1

A component of DCIG’s blog that has been absent for the last year or so are blog entries that are interviews with executives of data storage companies that DCIG considers thought leaders in their respective spaces. Today DCIG begins to rectify that by publishing the first part of an interview that I recently did with WhipTail Technologies Chief Technology Officer, James Candelaria, an emerging provider of SSD storage solutions. In this and subsequent blog entries, I will publish excerpts from my interview with James.

In this first blog entry, Candelaria and I discuss one of the biggest problems precluding the wide spread adoption of SSDs by enterprises: how to minimize and ideally avoid the massive performance penalty associated with repacking data on SSDs that is better known as garbage collection. In this blog entry, Candelaria describes the issue in great detail.

Ben: James, thanks for taking time to speak with me today about this issue. I know because of WhipTail’s position in the SSD space, you are intimately acquainted with this particular SSD issue. So to kick off our conversation and for the benefit of DCIG’s readers, could you describe in detail this SSD challenge and what steps WhipTail has taken to address it?

James: Ben, thank you and I would be happy to do so. The issue with SSDs has to do with optimizing the placement and storage of data after data has already been written to on a block on an SSD.

This has to be done at the controller level which requires that the controller repack the contents of a block, which is a set of cells that have to be erased together simultaneously. To repack the contents of that block, the contents of the block must first be put into a buffer, the change made, the entire cell erased, and then these changes pushed all the way back down to the cell. This incurs a massive performance penalty.

To avoid this massive performance penalty that this action incurs, storage providers currently implement over provisioning on a spare that you can’t see and that is already pre-erased. So in essence today’s SSD providers try to hide this activity from you, the end-user.

But eventually if you use up every cell on an SSD, all of the SSDs (includes the spares) will be dirty. At this point the SSD drive ends up going into garbage collection mode to figure out which pieces are still valid, which pieces are not, and then moves data around.

This again incurs a massive performance penalty as well as an endurance problem. These drives have to do this garbage collection on their own in the face of small random write requests.

So what WhipTail does differently is it intercepts all incoming data. To do so, WhipTail implements a log structured block translation layer. This log structured block translation layer essentially intercepts all IO regardless of its size and then places it in a small buffer that is precisely the size of the erase block of the array underneath it.

So if WhipTail has a 24 drive array and, let’s just say for argument sake, each erase block in each drive is 2 MBs, our write block size will end up being roughly 48 MBs. Then we have to subtract some for RAID (WhipTail runs parity RAID) and then subtract some for a hot spare. So we eventually end up with between 44 and 48 MB write blocks on a big array.

That buffer will get ejected from our stack all at once which basically gives WhipTail two advantages. First, WhipTail never does write amplify. In other words, WhipTail never takes a 4K write IO, and submits it to media where the media ends up having to perform that wasteful operation I just mentioned.

By avoiding this action, WhipTail reduces the number of times that a small random write IO will actually cause flash erasure which enhances SSD cell endurance immediately. It also enhances random write performance because the random write performance number is governed by how quickly you can actually write to the media. In this case WhipTail is writing data to the buffer so it can acknowledge immediately after the data has been staged in the buffer which then facilitates getting all of that data to media in one large chunk.

These two operations increase both random write performance and endurance. Here are some specifics. WhipTail’s endurance numbers have been rated at over seven years of endurance on a standard MLC based drive that has 5,000 PE cycles.

This is based on an overwrite ratio of roughly 3x per day. So WhipTail has basically taken write amplification all the way down from what most manufacturers advertise at roughly 10x write amp, to darn near 1:1 write amplification. Using the random write performance number, we tend to get 250,000 random 4K IOs per appliance.

What WhipTail does that is also important is that it always writes at a stripe boundary which means it never has to do a read in order to do a write such as I may have to do if using RAID 5 or RAID 6. This means that the SSD drives remain extremely happy and the performance stays up because it does not IO amplify.

In the next blog entry in this interview series with Candelaria he will explain how WhipTail optimizes SSD performance while minimizing the deficiencies of MLC flash.

In the third blog entry in this series, I discuss with Candelaria how WhipTail deals with variances between each SSD manufacturer’s hardware and firmware.

In the fourth installment in this series, Candelaria explains how and why WhipTail uses software RAID in its SSD appliance.

In Part V of this series, James and I discuss the hardware and software supported by WhipTail and why FCoE and iSCSI trump Infiniband in today’s SSD deployments.

Click Here to Signup for the DCIG Newsletter!

Categories

DCIG Newsletter Signup

Thank you for your interest in DCIG research and analysis.

Please sign up for the free DCIG Newsletter to have new analysis delivered to your inbox each week.