Magnetic Drives, You’re Fired!; Interview with GreenBytes CEO Bob Petrocelli Part I

Inline deduplication data storage solutions provider GreenBytes, Inc. recently released a new high-availability (HA), globally optimized solid-state drive (SSD) storage array solution called Solidarity that is garnering a lot of attention. Solidarity offers inline real-time deduplication and compression via a dual-controller unit outfitted entirely with SSD storage. The buzz over Solidarity is in large part because of its 200,000-plus IOPS performance–with deduplication and compression enabled.

We caught up with GreenBytes to talk about Solidarity. In this first part of a multipart series, GreenBytes CEO Bob Petrocelli gives us some background on how forays into SSD and the replacement of magnetic drives led to the development of Solidarity, a solution that’s got people talking.

Ben: Thank you, Bob for joining us today. If you could, please first give us an overview of what you guys have done previously with SSD, and then we can dive into the new product release.

Bob: Thanks for having me, and I’m happy to do that. Think about our use of SSD in basically three ways. … The three ways we’ve used solid-state storage up until now have been to:

(1) Provide a metadata storage repository for all of the data that describes where the data is laid out in our disk arrays.  So everything about the data–where the blocks are, the access controls, everything that actually causes a lot of overhead in a traditional disk array when you’re trying to seek block chains on disk–we have relocated to solid-state drives in a persistent yet very reliable way in our disk arrays. We’ve removed those from the magnetic spinning drives so that the only time we actually have to access magnetic drives in our magnetic disk products is to actually read or write user data.

(2) The other thing is we are using solid-state storage from the get-go as a very scalable repository for the metadata that results from the deduplication. What a lot of people have come to realize with deduplication, especially for primary data, is that it creates a lot of metadata, almost like a database, that describe what the duplicates are all about, where they are located, how many of them there are. You have to keep track of that deduplication database in order to be efficient.

Ben: How has that metadata typically been handled?

Bob: Most systems have tried to do this in basically two ways: One, they either used very exotic and expensive nonvolatile RAM to do that–that’s sort of how the Data Domains of the world have done it for their backup applications. And that basically causes a lot of expense. It also limits how much you can scale.

Or, two,  they have taken a sort of approach that the ZFS guys did, and they store it right in the main pool with the rest of the data. What that leads to is a lot of heartbreak and disappointment on the part of the users as your data scales, because you start chasing this metadata all over the disks and suddenly things that seemed to work well when there wasn’t much data deteriorate very rapidly when there is a lot of data.

And that affects your storage device, you suddenly find that nice little RAM cache you had that was holding all of that deduplication data becomes overwhelmed, and suddenly things tend to get pretty messy. So the vanilla behavior of processes like CFS and deduplication becomes pretty hard, especially in simple cases like writing a couple terabytes to your drive.

What we did is we created a storage system in which all of that metadata–the deduplication metadata, file system metadata–is all carefully preserved in persistent form on solid state drives.

Ben: You mentioned a third thing you did with SSD.

Bob: The final thing we did with solid-state drives is we also used them in our high-availability architecture. We combined the function of storing metadata into a rather unique patent-pending approach we’ve taken: we’ve combined the function of storing metadata with the function of synchronizing the storage pool between two controllers.

So, unlike a network appliance where they have to synchronize RAM over a memory buss–that’s quite expensive and, again, limited in size–we can have any number of SSDs in our storage backplane that are dual-ported SAS drives, so all controllers can see them. And the synchronization blocks that are used to protect the writes that are incoming are stored along with the metadata that we protected transactionally.

So we have this great process in which writes are coming in during one phase of a transaction and then metadata is updated during another phase. So we keep the drives busy, if you will.

If we have a controller failure, we can very rapidly have the second controller bring that pool right back online by just synchronizing all the data on the shared drives, and then do IP forwarding between the 10 Gb Ethernet on the controller, and make that completely host transparent. So that’s how we enable transparent high availability.

So we’ve got three missions that the same sets of solid state drives achieve in our current storage architecture, which are general metadata storage for all of the blocks in the storage system, deduplication specialized metadata storage write that’s got to go really fast, and then we’ve got this unique blocking feature which requires a special type of flash that has to have high endurance. So that’s the existing use.

Ben: So now we get to Solidarity.

Bob: Yes, and when I talk about this new product, it will make a lot of sense now when I go on to explain how that’s a natural evolution of the technology. Because what Solidarity does is it throws the magnetic drives out the window.

It replaces the user-data drives with low-cost but high-quality MLC [multi-level cell]–not consumer drives. Enterprise-quality controllers. Just lower-duty cycle flash.

The reason why we do that is we are effectively coalescing all those writes with another tier of much higher quality solid-state storage. What I mean by “higher quality” is much higher write endurance. That allows us to do what we call write de-amplification.

Because we’re transactional, that de-amplification is quite dramatic. We combine with that our already established very high-performance compression and deduplication to provide very high levels of effective storage. And we do it in the context of a highly available environment.

In Part II of this interview series, GreenBytes CEO Bob Petrocellis discusses the architecture of
Solidarity and what differentiates it from competitive SSD solutions.

In Part III of this interview series, GreenBytes CEO Petrocellis shares what he refers to as some of the “dirty little secrets” about some hardware that was
cleverly repurposed to give Solidarity an edge in compression.

In Part IV of this interview series, GreenBytes CEO Bob Petrocelli talks about Solidarity’s failover response, including a failover response time of merely three seconds between canisters measured during testing. 

Click Here to Signup for the DCIG Newsletter!


DCIG Newsletter Signup

Thank you for your interest in DCIG research and analysis.

Please sign up for the free DCIG Newsletter to have new analysis delivered to your inbox each week.