In late January 2010 someone posted a comment in response to a blog that I wrote in the October 2009 time frame. This individual was questioning the value that an inline deduplication solution like the Exar Hifn BitWackr DR1605 provides since most of the performance bottleneck associated with deduplication is primarily in the disk I/O.
Here is the exact comment that the individual posted:
I don’t understand the appeal of hardware assisted data deduplication solutions using something like Hifn’s Bitwackr. The deduplication performance bottleneck is mostly in disk IO. Having special hardware to compute MD5/SHA1 doesn’t help since the disk IO still dominates the performance load.
The WhipTail solution illustrates the problem. It’s using SSD disk since traditional disks are not fast enough. That means a hardware-dedup + traditional disks are too slow, or the hardware-dedup doesn’t help.
CPU is so fast and with multi-core these days that all the computation can be done by CPU. Also Hifn is not be able to compete with Intel in ramming up the CPU performance, i.e. commodity CPU works just fine.
Disk I/O can certainly become a bottleneck should the hash table associated with the deduplication data store become so large that it no longer fits in the system’s RAM. In those cases, performance can take a hit (and a marked one at that) as it needs to continually swap out more of the data in the hash table from memory to disk and back again.
That said, workarounds to this hash table bottleneck already exist, especially in storage systems used primarily for the deduplication of data backup. Quantum already includes a small quantity of SSD on its newest DXi Series. Rather than relying upon disk to store the deduplication index when it expands beyond the size of its RAM, Quantum keeps the hash table on SSD so large quantities of data can be more efficiently and effectively deduplicated. It is only logical to assume that other providers of inline data deduplication are looking to do something similar.
In regards to the implementation of the Hifn DR1605 PCIe card (the hardware component of the BitWackr technology) in the WhipTail Racerunner, Exar takes a very similar approach: it uses a fast SSD to store the hash table to expedite processing of the data deduplication has table. In a email response I received from Bob Farkaly, Exar’s Director of Marketing for Storage System Products, he explains :
BitWackr is a combination of software and hardware that performs inline data deduplication, data compression and (at the user’s option) data encryption. The key to data deduplication system write performance is latency, not the speed at which the SHA-1 algorithm processes. Indeed, the biggest hit to latency comes from compression, not hashing. (Encryption also adds to latency, but since its use is optional, I won’t dwell on it.)
The Hifn DR1605 PCIe card – the hardware component of the BitWackr – performs SHA-1 hashing, eLZS compression and if selected, AES-256 CBC encryption simultaneously in a single operation. The SHA-1 hash is used to determine whether the block being processed is unique or is a duplicate. If the block is determined to be unique, the compressed (and optionally encrypted) block is stored in the BitWackr Data Store. If the block is a duplicate, appropriate counters are updated and the next block of data is processed.
Without the DR1605, compression, hash generation and encryption are performed serially in three time consuming steps (which translates into latency). With the DR1605, these operations are performed simultaneously in a single pass, hence reducing latency.
By the way, compression is an important element of advanced data reduction. Our research (covering many terabytes of typical end-user generated data) show that roughly 2/3 of BitWackr’s data reduction is the result of data deduplication and the remaining 1/3 is the result of data compression. By performing hashing and data transformation (compression and encryption) block operations simultaneously, the BitWackr reduces latency in the deduplication process. This is absolutely critical because latency is the enemy of deduplication system performance.
Latency (particularly hash table processing latency) is the overwhelming factor affecting performance in any hash-based deduplication system. Every I/O to the data store requires one or more hash table accesses, so every millisecond or microsecond added to latency reduces performance.
Hence, our two primary recommendations about achieving high rates of deduplication throughput are to use a fast SSD (one that delivers the best possible 512-byte random read/write I/O) and to use the Hifn DR1605 card for hashing, compression and encryption.
In the case of the implementation of the Hifn DR1605 on the WhipTail Racerunner, WhipTail also adds its own algorithms to further optimize data deduplication on the SSDs on which the DR1605 hash table is stored.
So to finish answering this individual’s question, throwing more CPUs and multi-core processors at the data deduplication process only helps to a point. The deduplication algorithm still has to have some place to store its hash table and if there is not enough room in memory to store it, those big, powerful CPUs are starving waiting for the data to be loaded from the hash table that is now stored on disk. This is why having SSDs serve as the data store for hash table becomes so powerful – while it is slower than RAM, it is faster than HDDs by a factor of up to 100x and cheaper than DRAM which equates to making it possible to deliver inline data deduplication even from primary storage.
Those are my thoughts for this week.
Thanks for the question and have a good weekend everyone!