SSD’s Hidden Data Integrity Flaw and How Fusion-io Mitigates the Cost and Complexity of Fixing it

Next generation networked storage systems are adding solid state drives
(SSDs) at an accelerated pace as a means to deliver dramatic
performance gains for mission-critical, performance sensitive
applications. To accomplish this, SSDs are being constructed to look
and act like hard disk drives (HDDs) and while this seems sensible,
this creates the possibility for data integrity issues to emerge.

To offset this, enterprise SSD providers take a number of steps to account for these issues. Fusion-io
is unique in that it has come up with a method that mitigates these
problems while lowering the cost and complexity of fixing them.

It Looks Like an HDD But …

On
the surface, the process of making an SSD look like an HDD appears
straightforward. As part of accelerating the rate of the adoption of
SSDs, SSDs are made to look and act like HDDs so they are easily
recognized by existing server operating systems and/or storage system
firmware.
 
In this respect, the emulation of SSDs is going
pretty well. Users and/or manufacturers can plug an SSD device into a
slot on a PC, server or storage system such that it looks just like an
HDD to the OS and is managed as such.
 
But to make an SSD look
and act like an HDD, data translations and re-mappings need to occur
that call for the utilization of embedded CPUs and DRAM on the SSD.
These additional components increase the cost of SSDs as well as raise
the number of failure points in places that could compromise data integrity.
 
Specifically, these embedded components create the possibility that soft errors can occur without detection and correction. This can result in questions regarding the integrity of data stored or retrieved from the SSD.

Where Soft Errors Occur

Soft errors can occur at one of two times within SSD devices.

  • First,
    during writes, data sent to the SSD device is in a format that is
    suitable for storing on an HDD, not an SSD. To store it, the SSD uses
    its embedded CPU and DRAM to convert the data to a format that flash
    recognizes and then maps and stores the data to a location on the SSD
    device.
  • On reads, the opposite must occur. Since the
    application expects the data back in a recognizable format, the SSD
    device must again leverage its embedded CPU and DRAM to do a lookup of
    the data on the SSD so that it can be re-mapped back to a format that
    is recognizable by the application.

This process of remapping
the data from FC and/or SCSI format to flash and then back again is
where the possibility for soft errors is introduced. Since DRAM is used
as part of the process, the SSD’s calculations are susceptible to
errors. While infrequent, soft errors can and do occur as they are
caused by external forces such as cosmic radiation. What is important
to note is that there does not need to be a flaw in the hardware or an
error in the firmware for these soft errors to occur.

Cost is Why Soft Errors are Left Undetected

The
possibility of cosmic radiation impacting data integrity is overlooked
because today’s computer systems generally have built-in error
correction code (ECC) that can detect and correct soft errors caused by
this type of anomaly. However, most DRAM used in SSDs do NOT support ECC and most do not even have parity. As such, they can neither check nor correct these soft errors when data is remapped. The danger this presents is that the SSD cannot DETECT when a soft error occurs.
 
Unfortunately most consumer grade SSDs do not use DRAM that supports ECC or parity for one simple reason: ECC DRAM is more expensive. As a result, most SSD devices that have a FC, SAS or SATA interface do not provide this needed layer of data protection.
 
This
explains why in enterprise environments where data integrity is a
necessity that enterprises must deploy SSDs that detect and correct
these soft errors. But to do so, they must pay extra.
 
However
this technique is only one way to avoid soft errors. There is another
way that costs less, ensures that the SSD appears as an HDD to the
operating system and can detect and correct soft errors.

Solid-State Data Integrity with Speed but without Cost

Fusion-io
has developed a method to accomplish all three of these objectives:
reducing costs, avoiding soft errors and appearing as an HDD. To do
this, Fusion-io places its ioDrive directly on a server’s PCI-Express
bus.
 
In this configuration, the server’s CPU can directly interface with the ioDrive
using direct memory access (DMA) via the server’s PCI-Express bus. This
design eliminates the need for a storage controller in the server as
well as embedded CPUs on its ioDrive since the HDD translation does not
occur on the ioDrive.
 
However in order for the ioDrive to
appear as an HDD, Fusion-io installs a driver on the host that does the
remapping. This driver serves two important purposes.
 
First, by
moving the translation layer to the host, its driver functions much
like much like a page table does on today’s operating systems: it
translates virtual memory addresses to physical memory addresses.

Fusion-io’s
host software driver works in a similar manner. When the CPU asks for
some logical block address, it goes through Fusion-io’s software
driver. This driver translates it to where it is physically on the
NAND. Since it is physically addressed directly as memory, this
contributes to Fusion-io’s high speeds.

The other important
purpose that putting this software driver on the host serves is that
the translation is protected since the driver is in the server’s
memory. This allows it to take advantage of the native ECC and parity
protection found on these servers. Then as an added layer of security,
Fusion-io validates its results by double checking the data after it is
remapped to ensure it is all properly labeled once it is on the media.

The Fusion ioDrive: A Fundamentally Better Approach

Forcing SSDs to look like HDDs is done because it seems like the simplest and easiest way to accelerate the adoption of SSDs.  But SSDs are not HDDS! They
are flash and trying to make flash cost-effectively and safely look and
act like an HDD while still preserving the integrity of the data is not
easily done.

To do this, other SSDs take six steps to do the
data translation and remapping. This adds costs, slows SSD’s speed and
introduces risks. Fusion-io’s approach of putting its ioDrive directly
into the server enables organizations to more safely and
cost-effectively harness the many advantages that SSDs provides.

But
in the process of placing its ioDrive inside the server, Fusion-io does
more than just accelerate application performance, lower SSD’s costs or
preserve the integrity of data. Fusion-io also puts organizations on a
path of re-thinking not just how they should leverage SSDs in their
infrastructure but what is the appropriate role and placement of SSDS
and networked storage within their data center environment going
forward.

Click Here to Signup for the DCIG Newsletter!

Categories

DCIG Newsletter Signup

Thank you for your interest in DCIG research and analysis.

Please sign up for the free DCIG Newsletter to have new analysis delivered to your inbox each week.