Many organizations primarily view object storage systems as cost-effective solutions to host their archival and backup data. This mindset makes sense as it represents how many organizations introduced object storage solutions into their environment. However, the use cases for object storage continue to expand.
Organizations generate and capture expanding amounts of data and deploy new applications that process that data to generate new business value. To meet these new demands, object storage solutions must deliver both economical capacity and high performance.
Performance: Object Storage’s New Love
Multiple applications now push object storage solutions to function as more than archival and backup data stores. These sources include applications that generate log files; machine sensors that capture environmental and performance data; and video surveillance.
Organizations still want economical storage solutions on which to store these types of data. However, organizations also want to study and analyze this data, sometimes in real time, to support decisions and take actions.
Waits of multiple seconds or even minutes for object reads to complete represents the norm for many object solutions designed to host archival data. To expect them to suddenly deliver read response times under second does not happen accidentally. They cannot meet these demands because they were never designed to do so.
All object storage solutions face this growing challenge of delivering on competing demands of economical capacity and sub-second performance at scale. In response, more have taken steps to provide them. The ones best equipped to deliver on these new requirements offer the following three features.
- Stores object metadata on flash media
- Scale performance independently of capacity
- Stores objects in chunks and processes them in parallel
Feature #1: Object storage metadata hosted on flash media.
Organizations should first verify the solution offers the option to store object metadata on flash media (NVMe or SAS SSDs). Each object stored will have metadata associated with it. Due to the millions or billions of objects stored on a solution, object metadata databases will grow large.
These systems can and do host metadata in memory. However, the size of these metadata databases makes this technique impractical at scale. Storing all metadata on flash media accelerates access to the metadata and improves the possibility for sub-second read response times.
This need for sub-second response times explains why Cloudian, Dell EMC, and others recently introduced flash media into their solutions. Others such as Scality have offered the option to store metadata on flash storage for some time.
Feature #2: Scales performance independently of capacity.
Storing object metadata on flash media represents only the first part of the key to delivering performance at scale. Object storage solutions typically scale out capacity and performance simultaneously by introducing new server nodes into the cluster. Each server node may contain both flash media and HDDs with fixed amounts of both media types. Unfortunately, the available performance in the storage solution cluster may not meet application or user expectations.
Two ways exist to increase performance.
- Add more nodes to the cluster. Each node adds more capacity and performance to the cluster. This may improve the situation though organizations will buy unneeded capacity.
- Select a solution that frees them to scale performance independently of capacity. Using this architectural approach, an organization may install new flash media in existing nodes. They may introduce performance-centric nodes that primarily contain flash media and few or no HDDs. This provides the targeted performance boost they need without paying for unneeded capacity.
Feature #3: Stores large objects in chunks and processes them in parallel.
While organizations may one day store their object data on flash media, that day has not yet arrived. In the meantime, organizations will continue to store their object data on HDDs. This may present a performance challenge, especially when storing and reading large objects from HDDs.
Individual objects may grow into the hundreds of GBs if not TBs in size. Using a single process to read object data from HDDs on cluster nodes will take significant time to complete.
To improve response times, identify solutions that perform two tasks.
- First, they should break large objects into smaller chunks before writing them to multiple nodes and disks.
- Second, they should use multiple parallel processes to read back the object data.
These techniques serve the following two purposes. Spreading large objects across multiple nodes and disks enables the solution to both write and read objects back more quickly. This accelerates performance at scale.
Newer Object Storage Solutions Better Account for Flash Media
Object storage solutions that deliver both economical capacity and high performance at scale do exist. However, DCIG knows of only a few solutions that leverage all three features mentioned here to deliver on these enterprise expectations.
Enterprises should be wary of any solutions that have existed for over 10 years. Many have introduced flash media into their systems to host metadata on flash to help improve their performance. It certainly helps but how well?
Unfortunately, it remains unclear to what extent taking this step alone helps at scale. The early evidence seems to suggest it does not translate very well.
Those organizations scaling into the petabytes will be better served by identifying and choosing object storage solutions with more modern designs. These newer solutions better account for flash media, scale performance and capacity independently, and parallelize I/O to deliver performance even as data stores scale to multiple petabytes.
Keep Up-to-date with DCIG
DCIG announces the availability of new reports and webinars via its newsletter. Announcements include links enabling individuals to download the reports at no cost. To be notified of new DCIG reports, sign up for DCIG’s free weeklyNewsletter.
Technology providers interested in licensing DCIG blog articles or sponsoring reports, please contact DCIG for more information.