Today, I am continuing my discussion with WhipTail Technologies Chief Technology Officer, James Candelaria, whose company specializes in Solid State Drive (SSD) storage solutions. In Part III of this interview series, we looked at how WhipTail deals with the variances between solid state drive manufacturer’s hardware and firmware and how this emerging technology deals with those differences. In this installment, we’ll look at how and why WhipTail implements software RAID in its appliance.
Ben: Since hardware RAID can’t keep up with your requirements, you’re using software RAID. To that end, does WhipTail run its own operating system, or are you just performing your processing on a field-programmable gate array (FPGA)?
James: I would never do this in an FPGA. While a hardware approach is extremely attractive from a performance standpoint, the problem is that you just can’t keep up with the pace that Intel’s moving at these days. Intel has a tick just about every six months, and a tock every 12 to 18 months. And we’re about to see Sandy Bridge EP EN platforms hit where we will see a 25 percent increase in clock performance. To be competitive, I’d never be able to do this in FPGA. So we’re a software-based, Linux-based appliance.
There’s no reason to reinvent the wheel. The open source community has done a fantastic job in a lot of areas. Our intellectual property is all closed-source modules – we follow the GPL. However, we’re not taking advantage of any GPL module, like work queues or anything like that. We’re ensuring our proprietary layer remains that way.
Ben: Because WhipTail runs on an open source platform, when you encounter problems, you can then send an immediate update to your customers, right?
James: Yes and our turnaround time is pretty quick. This is one of the great things about not being an FPGA or an application-specific integrated circuit (ASIC.) If we ever get something wrong, we can fix it. I am not so proud to say that we have never gotten anything wrong or that we have never had to ship out a fix.
What I can say is I have never changed our on-disk format forcing customers to do a full forklift migration. We have always been able to get our customers fixes where they need them in reasonable amounts of time.
Ben: WhipTail has a good size buffer consisting of multiple layers of buffers that collect data until you get to a full erasure block. By “spoon feeding” the disks, WhipTail ensures the SSD array is not performing erasures at unpredictable times and messing up I/O performance. Is my understanding correct?
James: Yes. One of the problems you frequently hear about with SSDs is the “write cliff.” You get the disks full and all of a sudden the performance falls off. That is because they are running garbage collection. They have to.
So what we do is to feed it a new block, overwriting an old block in its entirety. That means WhipTail is always freeing a block. So when a write is coming in, it is freeing that block, which means that the pool of empties is always kept consistent. So we do not see the media walk into a corner, as we like to say. We have seen a lot of types of media over the years that go unmanaged, you walk them into a corner and they can never recover.
What we are finding interesting about this business right now, it is the “wild, wild west.” SSDs have really only been with us in the current form for a few years. Manufacturers unfortunately have not taken the lessons that the hard drive industry has lived by for the past 40 years to heart. Companies like ours really have an opportunity to determine the behavior and manipulate the data patterns so we can find the best case through the deficiencies that are present.
Ben: It sounds like you have enough cache for one error correction block. If my write is just really pounding in, you must have additional cache or RAM available to be queuing things up for performance reasons. Is WhipTail able to keep up with a full pipeline of writes right now?
James: Absolutely. We’ve tested pretty deep into the number of outstanding I/Os. With queue-ups deeper than 256, we still perform rather well. Obviously latency goes up because we actually have to flush the media at some point. But we have between four and eight overlapping outbound buffers that flush media. To give you an example, my latest benchmark on a single unit here has been 1.8 gigabytes per second of writes, all random 64 kilobyte I/Os, and that has been a flat line.
Ben: The class of flash WhipTail uses is pretty widely available. Do you face any potential supply-chain problems?
James: You would be surprised. Every time Apple announces a new product, the NAND industry goes into a tailspin because they buy up everything they get their hands on.
However, it has gotten a lot better. I will give you just a quick example. Two years ago, before Samsung and Toshiba brought their new plants online, Apple announced the iPad, and supply got very, very, very tight. We really had a hard time fulfilling orders. But Samsung and Toshiba have brought on new plants, and now the supply is pretty predictable.
Since we were using MLC flash, we are able to take advantage of the market forces and get the best price, and a product that is much more available than single layer cell (SLC) or enterprise mulit-level cell (EMLC). So we’re in the best possible supply situation. I can tell you for the past 12 to 15 months, I have not had to worry about supply.
Ben: I guess that could be an issue with being locked into one provider. If that one provider is experiencing supply chain problems, you are tied to them.
James: We definitely have not been exclusive to Intel. I have qualified other manufacturers and other NAND as well. Most of what we are currently shipping is Intel. But should Intel have a supply chain problem, I have qualified, tested, and quality-controlled Samsung and Toshiba NAND to a very specific controller and firmware level that we like. So we are prepared for that as well.
Ben: With Sandy Bridge coming, do you feel that you have got the headroom to use additional computing power?
James: Absolutely. When Intel gives me more processor, I get faster. That is a given. We are eagerly awaiting the release of Romley. I can not speak to performance number specifics, but we can say that Intel is not kidding when they said that chip is faster. My non-disclosure agreement has not expired yet, so I can not say how much. But it is faster. I do not think they would object to that statement.
As far as futures for us, I always hesitate to talk about futures except that we are going to continue to do what we have always done: Innovate in the data manipulation space, mitigating concerns that MLC has today: Endurance, random write performance, as well as provide enhanced manageability solutions. Today we ship in o
ne-half, three, six, and 12 terabyte capacities. In the future you can anticipate larger densities and unified panes of management.
We have a tremendous pool of engineers here who are really thinking about the next thing. Flash will be with us for a certain number of years. As lithographies slide ever tighter, we are at 25 nanometers now, and the game will change once we get below 20 nanometers, there are physical laws at that size where error rates are going to logarithmically rise. The software layer is the best place to manage large scale error-correction outside of hardware.
In our next and final installment in our series with WhipTail CTO James Candelaria, we’ll look at the types of storage hardware and networks supported by WhipTail and why FCoE and iSCSI are trumping Infiniband in today’s SSD deployments.
In Part I of this interview series, James explained the SSD garbage collection problem and how WhipTail handles it.
In Part II of the series, James discussed how WhipTail is optimizing SSD performance while minimizing the deficiencies of MLC flash.
In Part III, Candelaria and I discussed how WhipTail deals with manufacturer variations in SSD drives.