DCIG is pleased to announce the immediate availability of the inaugural 2022-23 DCIG TOP 5 Rising Vendors in Storage for Life Sciences report.
DCIG recently conducted research into Storage for Life Sciences solutions. This 2022- 23 DCIG TOP 5 Rising Vendors in Storage for Life Sciences report focuses on the top solutions by vendors of size less than $200M whom we consider to be “rising” in the market. The use cases are the same, but rising companies tend to be younger companies bringing newer technologies and disruptive solutions to market. Those organizations looking for modern approaches and potential competitive advantage may want to consider working with a rising vendor.
Storage for Life Sciences Challenges
Life Sciences organizations depend on some of the most compute and data-intensive applications in the world for primary research, on-line analysis, global collaboration and product development. Many of these applications are High Performance Computing (HPC) quality workloads that include genomics sequencing, molecular simulations, protein folding, AI/ML optimization, and intensive media processing. These applications can push even cutting edge IT data storage implementations to their performance and capacity limits.
And increasingly, upstream life sciences workloads feed critical data into downstream workflows that might for example manage real-time medical-grade production lines or oversee global distribution.
The stakes are high for IT in life sciences with competitive global pressures, the search for life-saving solutions, expensive data “source” equipment, elite research staff and ever-increasing data volumes and performance demands. For example, voracious genomics sequencing equipment can easily generate TB’s of raw data in a few hours, overwhelming local legacy file storage often configured with the default operator workstation. This data must then be offloaded into downstream research-feeding storage while that equipment is idling, wasting opportunity, time, and resources.
Other workloads present challenges too—molecular simulations running on scale-out HPC clusters impose HPC data consumption patterns, which require local high-speed parallel file IO of very large or very many files to many client nodes at once.
Many life sciences workloads need to strongly leverage critical GPU resources, and maximizing the utilization of large numbers of GPUs may require specific GPU-related storage features. And on-line data capacity demands mount with each type of research conducted, each form of analysis required, and every AI/ML model built.
Traditionally, data requirements for these highly demanding applications have been served with parallel file system solutions. Parallel file systems can deliver massive IO volumes to data hungry applications (like those running in large HPC or HPC-like compute clusters), but historically implementing, operating and tuning massively parallel file systems for top performance at scale has been the province of dedicated PhD level academics.
Today, huge volumes of data are not only created and processed by primary life science applications, but that data then must be shared with multiple consumers and global collaborators. It is obviously expensive (and often prohibitively slow) to make and maintain many multiple copies of very large data. It can also be quite expensive to maintain large histories of data online for downstream research activities. It is traditional to archive HPC data sets into object storage for downstream use. However, this secondary storage dataflow means that data lives in multiple locations and in different forms, adding friction and delay to sharing, complicating data access and preventing optimal data value extraction.
In addition, “academic” storage solutions have tended to be light on enterprise storage management features like data protection, security, and backup/disaster recovery functionality. While primary HPC storage must first meet the HPC application requirements, it is increasingly the case that research data stays on-line for longer time periods to be recalled and leveraged on-demand and accessed by a wider set of applications. Maintaining this key data over time properly then becomes more critical to the ongoing overall success of the organization.
Benefits of an Effective Storage for Life Sciences Solution
At the top level, scalable storage performance is critical. A life science organization should try to fully utilize their high-end research and laboratory equipment, HPC clusters, and GPU-intensive analytical servers.
The objective of life science storage then is to store and flood massive amounts of scientific data into all the expensive data pipelines serving the organization’s goals. An effective and equally scaled storage solution will ingest and store data faster than source equipment can produce it and then deliver it as fast as the sum of consuming workloads calls for it.
Scalable storage will scale-out almost indefinitely, as needs and requirements grow or expand. The best solutions can offer a single “namespace” for all files even as the data storage under management grows into exabytes, eliminating multiple arrays, needless replication and extraneous data copies. Storage solutions that really support massive scale-out to thousands of devices also offer resilient designs and non-disruptive upgrade/ repair features to avoid single points of failure and downtime.
Perhaps even more important than maximizing resource utilization is fully empowering scientific staff. The top storage solutions work transparently to keep data flowing to all users, in real-time on-demand, driving their applications and analytical workflows without lag or downtime. The best scalable storage solutions also present simple (and automated) data storage interfaces, freeing staff from onerous storage management concerns and enabling them to focus more on their productive research.
There is plenty of competition in the global race to life science insight and discovery, with massive opportunity for organizations that can most efficiently leverage resources, empower researchers and minimize IT risk and distraction. Life sciences storage can make a significant difference in organizational outcomes by delivering world-class performance, scaling readily to handle the largest of online data requirements and significantly increasing organizational efficiency.
Finally, we are seeing top end storage solutions increasingly supporting downstream and collaborative workflows through wider multiprotocol support, native storage tiering, inherent data protection, multi-tenancy and data security features. Overall, top life sciences storage solutions, even though often accelerated on high-end appliances or custom hardware, are becoming more cloud-like in utility, economics, and management.
DCIG TOP 5 Rising Vendors in Storage for Life Sciences
The general categories under which the features evaluated for these offerings fell included:
- Deployment Capabilities
- Data Protection
- Product and Performance Management
- Documentation Support
- Technical Support
- Licensing and Pricing
Based on these criteria, DCIG awarded the following offerings a TOP 5 ranking (in alphabetical order):
- Pavilion HyperParallel Data Platform
- Qumulo File Data Platform
- VAST Data Universal Storage
The full 2022-23 DCIG TOP 5 Rising Vendors in Storage for Life Sciences report is available for download immediately at the link below with registration. The full report contains additional details such as:
- A listing of all offerings evaluated
- Distinguishing features of storage for life sciences
- Similarities among the TOP 5 offerings
- Differences between the TOP 5 offerings
- A profile of each offering that highlights key features that earned each offering a spot among DCIG TOP 5 Rising Vendors in Storage for Life Sciences.
Get the 2022-23 DCIG TOP 5 Rising Vendors in Storage for Life Sciences report here.
CONFIDENTIALITY AND COPYRIGHT
Other than excerpts from this report published in this announcement, the contents of the 2022-23 DCIG TOP 5 Rising Vendors in Storage for Life Sciences report are copyrighted and may not be republished without permission.
KEEP UP TO DATE WITH DCIG
DCIG will release more TOP 5 results in the weeks and months to come. Please sign up for the weekly DCIG Newsletter to receive notice of their availability.
Technology providers interested in licensing DCIG blog articles as executive white papers, please contact DCIG for more information.