Cloud providers, healthcare, media and entertainment, government, and other enterprise organizations increasingly generate and store multiple petabytes (PBs) of data. This prompts many of them to seek out and obtain scalable storage solutions to host these amounts of data. As they do so, they should follow new rules when acquiring modern scalable storage solutions.
Diverse Workload Demands
Organizations rightfully often look first at cost when considering scalable storage systems to host their modern data lakes. However, they may quickly and unexpectedly find this data frequently accessed either initially or over time. The frequency of data access and rapid application response times required may catch them off-guard.
These demands put enterprises on notice to look for and obtain scalable storage solutions that accomplish three objectives to include:
- Easily and economically grow to manage multiple petabytes of data
- Meet changing application capacity and performance demands
- Dynamically detect and adapt to changes in the environment with minimal intervention
Three New Rules
To select the best storage solutions that scales to meet these business and technical objectives, they should follow these three rules.
Rule #1: Select scalable storage solutions that offer scale-out architectures
Scale-up storage systems (storage arrays with one or two controllers and backend storage shelves) have been industry standards for decades. They are dependable, economical, well-understand, and relatively simple to implement.
Scalable storage systems that use scale-out architectures with sophisticated, yet increasingly simple to manage, clustering software have changed the conversation. This software runs on industry standard x86 server hardware, can start small (three servers,) and can scale to the hundreds.
Scale-out systems also address the nagging issues of scale-up storage, namely data migrations and hardware refreshes. Using these scale-out solutions, organizations may introduce a new server into an existing cluster. The storage software then automatically migrates data to the server from other existing servers in the cluster.
Decommissioning an existing server becomes just as easy. Once selected for decommissioning, the storage software evacuates data off it and moves it to other servers in the cluster. Once removed, one may non-disruptively remove the server from the cluster.
Rule #2: Select scalable storage solutions that offer both NAS and S3 object storage interfaces
Organizations must re-think how they store and access their increasingly large data stores. As multi-petabyte data stores become more common, they may also remain active and frequently accessed. This requires organizations to select storage solutions based on their capacity, performance, and the networked storage protocols they support.
Thanks to hardware innovations and software architectural advancements, modern scale-out storage solutions deliver on these requirements. They scale out to hundreds of petabytes, deliver high levels of performance, and support network attached storage (NAS) and object storage protocols.
The support of object storage interfaces such as support for S3 represents a significant enhancement for many of these solutions. The overhead associated with managing all the S3 metadata tended to relegate these storage solutions to archive class solutions.
However, more of these storage solutions now store their S3 metadata on SSDs. As a result, they provide performance and throughput that rivals or even exceeds traditional NAS-only systems. This also explains why more providers of NAS systems have added S3 support to their scale-out storage solutions. These developments make scale-out storage solutions supporting both NAS and S3 appropriate for hosting large, active application data stores.
Rule #3: Select scalable storage solutions that natively offer connectivity to third-party storage clouds
The ability for these storage systems to scale to many petabytes amplifies their need for connectivity to third-party storage cloud providers. Enterprises that store petabytes of data often have an increasing need to store some or all this data off-site.
They may store it with these providers for archival, backup, business continuity, compliance, or economic reasons. Regardless of their motivation, more scalable storage solutions natively equip them to seamlessly store data with third-party cloud storage providers.
The Rules Have Changed
Organizations that currently or anticipates storing a petabyte or more of data should re-examine the storage systems they use. New scale-out storage solutions have sufficiently matured to where they offer the ease of deployment and management of scale-up storage systems. They also better address the long-standing data migration and hardware refreshes associated with scale-up storage solutions.
More importantly, scale-out storage solutions offer the additional new features that organizations need to manage petabytes of data. They now support multiple networked storage protocols, to include S3, with the underlying technologies to deliver high levels of performance. Further, they recognize and support third-party storage clouds to meet other business needs.
This combination of factors puts organizations on notice. When it comes to supporting data stores of over a petabyte, the rules for selecting scalable storage solutions have changed.