Today is part 2 of an interview I recently did with WhipTail Technologies Chief Technology Officer, James Candelaria, an emerging provider of SSD storage solutions. In my last entry, he and I discussed one major roadblock to widespread enterprise SSD adoption: the performance penalty incurred by garbage collection. This time, we’ll look at how WhipTail optimizes SSD performance while minimizing the deficiencies of MLC flash.
Ben: We’ve talked a bit about the challenges of SSD and how WhipTail addresses them as well as how you’re solving performance-related issues with SSDs used in RAID arrays. Are you doing any other optimizations in your stack or are you just trying to quiet down the interaction between the flash cells?
James: WhipTail’s claim to fame right now is the mitigation of the deficiencies in MLC flash. In most of our intellectual property, we sit directly underneath the transport stack and above the RAID stack. So we optimize as data comes in on the fiber channel or iSCSI, or even out of an ext3 or xfs file system.
As data comes into our block device, it manipulates the data and submits it to the RAID layer using a fairly standard RAID stack. So it’s an ideal place for things like enhanced ECC data reduction strategies possibly later on, things like that. So I can’t speak to some features, but you can imagine by our architecture that we’re strategically placed.
Ben: So what WhipTail is doing is optimizing your block sizes to fit exactly within the size of a RAID stripe?
James: Yes, exactly, right to the borders of the write stripes and the borders of an array slot on the flash media. One of the things I always like to point out here is that we’ve been shipping products for over two and a half years to customers.
So we have a lot of empirical data about how these flash controllers work over time and the behaviors of different controllers and firmware over a large install base, not necessarily just how they work on a spec sheet.
We have over 100 customers deployed in the field and probably the largest install base of VDI (Virtual Desktop Infrastructure) flash storage out there. So there are thousands and thousands of VDIs sitting on top of us, and that gives us a real good feel for how accurate our projections have been.
One of the things WhipTail did most recently was to look at the hardware and software layers of one of our oldest customers who have been in production for over two and a half years. We looked at them and did the math on figuring out how long their array would last on current endurance rates. The answer I got back was 22 more years of service before we get flash endurance levels that were unacceptable.
Ben: What do you think the service life would have been without your software stack?
James: Two and a half years ago we were shipping 54 nanometer flash, which had endurance levels of 10,000 cycles, double the industry’s standard endurance. If we had not done something about write amplification, I think we probably would have seen a wear out in about a year, maybe less, primarily due to the fact that the workload was a fairly heavy 8K random write IO.
Ben: With a 48 MB buffer for 24 two MB erasures, are there additional buffering requirements to prevent blocking while you’re flushing?
James: We do keep a set of overlapping buffers. Currently I think it’s a total of four overlapping outbound buffers. So while one set of buffers is flushing, the other one is filling, so we don’t end up with too many stalls. Even if you’re pounding at a full line rate all day long, you’ll see a fairly linear performance experience.
Ben: Any time that things are sitting in a RAM buffer like this and have not been written, you risk data loss, right?
Ben: So how does WhipTail handle that situation?
James: Very good question. We handle it in two different ways: First, we do have a maximum data age timer. WhipTail will flush buffers if they are not full within half a second. That’s tunable, though, depending on the customer’s tolerance for a hole in their data set. Second, customers are required to run a solid UPS. Your storage has got to be behind a good UPS. So those are the two mitigation strategies right now.
Ben: And then what does WhipTail do for redundancy? Is this a single point of failure here? Or can I set up multiple controllers?
James: We do sell HA devices. Currently our HA relies on synchronous replication between two units. So we have two units side by side over a 10 GB umbilical. And that allows you to ensure that your data exists in both places.
In case of a fail over, one will assume the identity of the other. It’s not a traditional HA, dual controller architecture. Instead, you have two storage processors in one single shelf of storage. In my experience, if you have a failure, nine times out of ten it’s going to be the underlying media, not the underlying controller that’s going to be the problem.
Ben: Believe it or not, I have had a controller fail on me once. And then the HA failed. And chaos ensued. So I’m sensitive to it.
James: I’ve been down that road too. Prior to founding Whiptail, I spent 10 years in high end Fortune 500 consulting. I’ve found that it’s not “if” it will fail, it’s “when.” We’re not dealing with perfect machines here. With that in mind, we always provide for redundancy and think about contingency situations.
In the next blog entry in this series, I will continue my discussion with Candelaria looking at how WhipTail deals with variances between each SSD manufacturer’s hardware and firmware.
In Part I of this interview series, James explained the SSD garbage collection problem and how WhipTail handles it.
In the fourth installment in this series, Candelaria explains how and why WhipTail uses software RAID in its SSD appliance.
In Part V of this series, James and I discuss the hardware and software supported by WhipTail and why FCoE and iSCSI trump Infiniband in today’s SSD deployments.