Automated storage tiering (AST) seems to be getting ever more attention as more organizations move from physical to virtualized environments and look to use networked storage systems with AST to support them. But AST carries its own set of baggage and can potentially create as many problems as it solves, not the least of which is that it may not be as automated for all application workloads as some vendors may lead you to believe.
Two natural outcomes of most organization’s server virtualization initiatives include server and storage consolidation and the deployment of networked storage systems. The server and storage consolidation occurs as a result of multiple physical servers being virtualizaed and consolidated down onto one physical host. The use of networked storage occurs as companies look to implement more advanced solutions such as VMware Fault Tolerance (FT) and High Availability (HA) into their environments and need external networked storage in order to execute upon those initiatives.
However organizations are also finding out that consolidating the data all of these applications on networked storage arrays is not without its downsides. Two of the most bothersome are that simply consolidating applications does not stop their data growth and each application has different performance requirements.
This puts organizations into a bit of a quandary. Buying more disk to keep up with storage growth is not prohibitively expensive. However buying more disk does not necessarily equate to improving the the performance of the applications that store data on them.
Store too much data on too few disks and those disks quickly become performance bottlenecks for those applications. Store too little data on disks in order to meet application performance requirements and the costs associated with external storage quickly escalate.
It is for this reason that AST on networked storage arrays is garnering so much attention. The premise upon which AST is built is that it will automatically put the right data on the right tier of storage at the right time. This is done in such a way that both the performance and storage capacity of the storage system is optimized without resulting in a performance hit to the application or storage costs spiraling out of control.
In theory, this rationale sounds good but there are a few holes in this theory:
- Under what conditions should data be moved from one tier to another to optimize performance and/or storage capacity? Only when the application accesses the data frequently? What constitutes the definition of “frequently?” Should the system be moving data when other applications are accessing data?
- What constitutes the definition of “automated?” Does “automated” mean moving data at anytime? Just at night? Just when policies dictate data should be moved? If just moving data during “scheduled” or “preset” times,does that really meet the definition of “automated”?
- Will the overhead associated with moving data offset whatever benefits AST provides? Moving all of this data around incurs a performance penalty on the storage array as it is copied and written to other disks within the array. Further, as the array dedicates resources to monitoring and moving this data, does it pull resources away that are used to service the application writes.
So it was with some interest that I listened in on a NetApp conference call earlier this week where NetApp discussed its new Virtual Storage Tier that is intended to overcome some of AST’s challenges by doing the following:
- NetApp systems initially put data on the lowest tier of storage on the array. NetApp refers to this as the “physical tier” where it also applies other
features such as thin provisioning and deduplication to optimize
storage utilization. This lowest tier of storage can be FC, SAS or SATA drives, long term NetApp sees all companies adopting SATA as the initial resting place for all types of data as it is not finding a significant performance differential between higher speed FC/SAS drives and SATA drives,
This claim is consistent with what I am hearing else where. FC/SAS drives possibly provide a 10 – 15% performance gain over SATA though this may not be detectable except for high performance applications.
- As data is accessed from this physical tier, it is put into the Virtual Storage Tier. Data that is being accessed is by NetApp’s definition “active” and, as it is read, it is put into cache on the NetApp storage system. However rather than simply deleting that data from cache as it ages (a copy still remains on disk,) it destages this data onto PCIe attached flash. Then when future requests for that data come in, the NetApp system first checks to see if a copy of it resides on this PCIe attached flash before going to physical disk to get it.
So the benefits that this two-step approach that NetApp takes to AST are:
- Real-time data promotion of active data
- No need to wrangle with the definition of automated
- No extra overhead on the system to copy and manage the movement of data to different tiers of disk since the data has already been read and is now merely being destaged to SSD for faster access going forward
- The extra step that NetApp systems have to take to check to see if data is on SSD before going to physical disk to retrieve data should be miniscule from a performance aspect and more than offset by the increase in performance that SSD provides.
So in looking at this approach to AST that NetApp is introducing, is it really the best way to implement AST? On the surface, it appears to eliminate many of the concerns that exist now about AST while delivering on many of AST’s promised benefits. For instance, it eliminates the copy-and-write penalty, it gets active data on the fastest tier of disk and it takes advantage of existing techniques that storage arrays use to access, store and manage data under its control.
But to say that NetApp’s implementation of AST with its Virtual Store Tier technology is “the best” form of AST is still a difficult case to make. In this design, NetApp starts with the assumption that all data should be written to physical disk first and then only moved to SSD after it is accessed.
The problem with this approach, at least in NetApp’s case, is that NetApp does not write data across all disk drives in its systems as some storage systems like HP 3PAR does. Instead it just writes data to a single RAID group.
So how much does this matter? It will really depend on the type of application workloads on NetApp systems. For instance, I can see NetApp’s Virtual Storage Tier making a lot of sense in virtual desktop infrastructure (VDI) environments where desktop images are very similar and caching a set of deduplicated data in SSD can dramatically accelerate performance for all virtual desktops.
But in environments where the same set of data is not continually accessed, whether or not NetApp’s approach to AST is the right one is debatable. Destaging “active data,” or data that has just been read, to SSD may provide little or no benefit in environments that have a great deal of randomly accessed or sequentially read data. If anything, by first storing this type of application data on SATA disk may actually contribute to degrading an application’s performance, not making it better.
NetApp’s new Virtual Storage Tier technology puts a new twist on the AST story by arguing that by first storing all data to disk (any disk) and then moving data as it is accessed to SSD is a “best” way to implement AST. However I am not convinced this approach works universally well in all environments. Maybe when it is used in conjunction with VDI and file serving environments it makes sense. But I remain skeptical that in conditions where application workloads have a lot of random data access or sequential reads that this technique is really the optimal way to implement AST.
Additional Note (March 16, 2011)
I did receive some feedback from NetApp where it stated that its systems do write data across all of its disks. After some back and forth between myself and NetApp, what NetApp claims is possible but I do not agree in total with NetApp so I am not revising the main text of my blog entry. While NetApp does offer the ability to stripe data across multiple disks in its systems, NetApp does not deliver this functionality in the same way that HP 3PAR does, at least not in the context in which I was describing in this blog entry.
NetApp systems can be configured in such a way that RAID groups can be combined into what NetApp refers to as an aggregate. From this aggregate, volumes (or LUNs) can be created that are then presented to servers. Data written to those volumes is then striped across all the RAID groups in that aggregate.
However this process is more manual than what HP 3PAR automatically provides which was the point of this blog entry – automated storage tiering is not yet as automated as it should be. In NetApp’s case, to get to automated storage tiering, there is a very manual setup process.
Jerome
This blog entry was updated on March 2, 2011, and March 16, 2011, to reflect some inaccuracies about NetApp’s implementation of flash made in the initial posting on Feb 25..