The Maturity of High Availability and Reliability in NFS v4 Brings NAS and Mission Critical Application Processing Together

When organizations think of running their mission critical applications on “highly available, highly reliable” storage systems, they almost always think of FC SANs.  It is time for that mindset to change. The now mature and proven availability and reliability features of NFS v4 coupled with organizations who can testify to their use of NFS v4 in mission critical environments are creating new use cases for deploying more cost effective, easier to manage NAS storage solutions.

NFS’s High Availability and Reliability Struggles

NFS’s prior struggles with delivering enterprise class availability and reliability are in part attributable to how previous versions of NFS (pre-NFS v4) managed the “state” of files.  While NFS’s file management is suitable for single servers, new NFS issues emerge when applications are deployed on clustered pairs of servers.
One issue has to do with an NFS imposed limitation that prevents the second server in a two server cluster from accessing a file on the first if that first server should fail. In pre-v4 versions of NFS, the file server places a lock on the file that the first server is accessing to protect the integrity of the file.  This works well until one gets into a clustered server configuration.

At that point the file lock becomes a problem when/if the first server fails. Once it fails, NFS continues to maintain its lock on the file since NFS is unaware that the first server has become unavailable. So when the application starts processing on the second server, it cannot access the file until the existing NFS file lock is released. Unlocking this file generally requires intervention on the part of a system administrator.

Failovers are not the only point where NFS’s file management limitations rear its head. It also surfaces during routine application operation.

For instance, if an application client moves from the first server to the second, the application client may no longer be able to access the file since the first server still owns the rights to the file.

Pre-v4 versions of NFS do permit the introduction of third party software that can manage this switchover of an application from the first to second server. However this additional software makes creating highly available and reliable NFS configurations more costly and complex to implement and manage.

NFS v4 Brings High Availability and Reliability to NAS

NFS v4 provides for improved options for stable client fail-over that did not exist in prior versions.

Arguably one of its biggest improvements is its client-server lease management feature that changes how it handles file management. Whereas pre-v4 versions of NFS placed a permanent lock on a file while it was being accessed, the default setting in NFS v4 requires clients to “check-in” every 45 seconds or else the file lock is released.

The primary advantage this provides is ease of application fail over in NFS environments as it eliminates the need for administrators to get involved with releasing file locks should the first server fail.  Now the application can fail over to the second server and access the files that it needs in 45 seconds since NFS v4 automatically releases the file locks in that time.

The client-server lease management in NFS v4 also comes into play when an application moves from the first active server to another that is also active.  NFS v4 supports several file sharing modes that can control and manage file access by other application clients.

Leveraging these different modes, application clients can specify the mode in which specific files are accessed.  So if an application client does move to another server, prior to moving it can use NFS v4 to set file permissions so it can continue to have uninterrupted access to the files without waiting for the default NFS v4 time out that would release the first server’s file lock.

These functional improvements to NFS v4 permit organizations to deploy NAS storage solutions in highly available and reliable environments or use these applications in conjunction with NAS storage solutions that are already implemented.

However a big part of the reason that organizations have not yet deployed NAS solutions with NFS v4 is that users still remain unconvinced of its maturity. Convincing users to move their mission critical applications to NAS is nearly impossible if proponents of NFS v4 cannot provide other customers who can testify that NFS v4 works as promised when deployed in this type of environment.

NetApp Puts Its Money Where Its Mouth Is

It is this specific customer concern that NetApp sought to address. NetApp made the decision to deploy its own Unified Storage Architecture with NFS v4 to support its internal mission critical TIBCO Enterprise Service Bus (ESB) application.

NetApp uses the TIBCO ESB to broker messages between its different billing, shipping and order taking applications. During the day, these applications generate millions of messages that the TIBCO ESB application server processes and then stores to a file.

NetApp had already once tried running TIBCO ESB on NFS v3. However the feature set in NFS v3 did not meet TIBCO ESB’s availability and performance requirements so NetApp opted to assign shared LUNs to the TIBCO ESB servers and install a host-based clustered file system on those servers to meet the TIBCO ESB’s availability and reliability requirements.

This all changed in December 2007. It was at that point that NetApp determined that its implementation of NFS v4 on its NAS storage solutions had sufficiently matured and were ready for use with TIBCO ESB.

Now over two years later NetApp’s move of TIBCO ESB onto its NFS v4 supported platform has paid off. NetApp is still running TIBCO ESB on its Unified Storage Architecture and, because NetApp put its money where its mouth is, more of NetApp clients are evaluating implementing NFS v4 in their own environments.

In this case, NetApp is acting as a reference as to why prospective customers should use an NFS v4 enabled NAS solution. Among the benefits NetApp cites, it no longer needs to buy, install and manage host-based clustered file systems on the TIBCO ESB servers. Further, NetApp is achieving greater storage management efficiency as it can now manage more of its storage in the same way.

Deploying NAS with NFS v4 is an Act of Reason not a Leap of Faith

NFS v4 specifications were released in 2003 so they are not new standards by any stretch and have been widely adopted and put in place by many NAS providers. But the lag time between the initial announcement of the NFS v4 specifications and the time it has taken to implement its features have been substantial.

This lag time surely contributes to why users might feel like NFS v4 is a “new” specification that cannot yet be “trusted” in mission critical environments.

However NetApp recognized that it could not ask end users to take a leap of faith that it was not first willing to take. So by running the TIBCO ESB application on an NFS v4 enabled NAS solution for over two years, NetApp helps to break through this logjam of both a lack of reference customer accounts and the benefits that NFS v4 can provide.

NetApp’s own example demonstrates the NFS v4 is ready for prime time and that prospective customers do not need to a leap of faith in order to run mission critical applications on NFS v4. Rather, this example should provide other companies that currently use host-based clustered file systems to support any of the
ir mission critical applications
ample reason to take a more careful look at the “new” high availability and reliability features found within NFS v4 and how they can leverage them to achieve the same reduced costs and improved operational efficiency that NetApp has already achieved.

Click Here to Signup for the DCIG Newsletter!


DCIG Newsletter Signup

Thank you for your interest in DCIG research and analysis.

Please sign up for the free DCIG Newsletter to have new analysis delivered to your inbox each week.