A Working Definition of Hybrid Storage Arrays and Why This Definition Matters

As we were researching arrays for inclusion in the DCIG 2013 Flash Memory Storage Array Buyer’s Guide, we kept encountering an intriguing group of companies that had designed–or were developing–storage arrays from the ground up to realize the performance benefits of an all flash array, but with storage capacities and price points that would bring the benefits of flash memory storage to a broader range of businesses. The resulting hybrid storage arrays achieve this balance of performance, capacity, and cost by intelligently combining flash memory with large capacity disk drives in a single storage system.
After completing our research on “all flash” arrays, we turned our attention to these hybrid storage arrays. In the process we had to wrestle with a decision, “What distinguishes a Hybrid Storage Array from a standard storage array that just mixes flash/SSDs and HDDs in the same device?” After all, our research had revealed that nearly 75% of enterprise midrange arrays support the inclusion of flash memory in some form or fashion. 
Defining a Hybrid Storage Array
In his excellent article, “Hybrid Storage Poised to Disrupt Traditional Disk Arrays”, David Floyer of Wikibon suggested that part of the hybrid storage definition is “The initial writing of all data is to the flash layer.” There is much to commend this definition, but some of the vendors in the hybrid storage space, including Nimble Storage and Oracle, make sensible arguments that sequential IO is better sent directly to hard drives while random IOs are coalesced in NVRAM or SSD-backed DRAM, and then delivered to the SSDs and hard drives as sequential IO. 
Others suggested to us that the key differentiator is inline deduplication or compression. Their rationale is that these storage efficiency technologies are necessary if the array is going to deliver the performance that hybrid storage arrays need to offer at a cost that businesses can afford. They argue that just mixing and matching SSDs and HDDs won’t do (too expensive) nor will doing post-process deduplication and/or compression (too slow).
An argument could also be made for the key differentiator being storing meta-data separately from the actual data on two separate flash memory devices. Since as much as 80% of all storage system IO consists of metadata operations, it makes sense to store metadata in flash memory. One example is a snapshot or differential backup operation. A lot of IO is generated comparing the timestamp of the last change to a block of data or file with the current time to determine whether or not the block/file needs to be included in this snapshot or backup. Although storing metadata separately is a smart way to enhance storage system performance, it isn’t enough to distinguish hybrid arrays from standard arrays.
Dynamic Data Placement in Pools of Flash and HDD Storage
DCIG believes that ultimately, the key differentiator between a hybrid storage array and a standard array is dynamic data placement in a storage pool that combines flash memory and HDDs. Some vendors refer to this as automatic storage tiering or dynamic storage tiering. Others talk more in terms of sophisticated caching and data placement algorithms. This dynamic data placement may be based on preset system policies/algorithms, user-defined policies, user-defined performance targets, or some combination of the three.
Dynamic data placement is key to hybrid storage system performance. Those arrays that do the best job of dynamically placing data on the appropriate performance/capacity tier should deliver the best price/performance ratio, and deliver both the performance and the capacity that businesses need from their storage systems.
Dynamic data placement relies on continuous intelligent monitoring of storage. This enables a “set it and forget it” approach that prevents performance problems rather than correcting for performance problems after the fact–which is essentially what scheduled storage tiering accomplishes. 
Vendors Delivering Hybrid Storage Arrays Today 
It is an interesting mix of at least a dozen startups and established storage system vendors. The startups include Avere, Fusion-io, Nimble Storage, Starboard Storage, Tegile and Tintri. The established storage vendors have entered this space primarily by acquiring startups, but now bring their enterprise sales and support organizations to the platforms they acquired. These vendors include EMC, Hitachi Data Systems, HP, IBM, Imation, and Oracle.
Why It Matters
Data storage requirements continue to escalate across the board. A doubling of storage within an organization every two years is now common. After an initial “hot” period that is typically less than 90 days, most of that data will be accessed only on rare occasions. Through dynamic data placement, hybrid storage arrays achieve a balance of performance, capacity, and cost that makes sense in addressing a broad range of business requirements.
DCIG is in the final stages of preparing a number of Hybrid Storage Array Buyer’s Guides that will give organizations the opportunity to do “at-a-glance” comparisons between many different hybrid storage arrays that meet this definition. The Guides will enable them to more quickly sort through the many available hybrid storage arrays to identify a short list of arrays that meet their specific needs. From that point they can focus their product evaluation energies on those selected arrays and move to the competitive bid process more quickly.

Click Here to Signup for the DCIG Newsletter!


DCIG Newsletter Signup

Thank you for your interest in DCIG research and analysis.

Please sign up for the free DCIG Newsletter to have new analysis delivered to your inbox each week.