fotogestoeber - Fotolia
- Jim Whalen, Taneja Group
If you view hyper-converged platforms as an evolutionary stream within the overall IT space, one next logical step is to extend hyper-convergence to also include secondary storage. In many ways, this is a bigger, more complex problem than the original one hyper-convergence was intended to solve.
Hyper-convergence emerged in response to the growing complexity of IT systems. With the takeoff of virtualization in the early 2000s, virtual machine-centric vendors introduced hyper-converged systems where compute, storage, and networking were all tightly integrated in a single box below the hypervisor. Over time, these hyper-converged platforms evolved to support web-scale distributed file systems, creating an even more seamless environment. Customers merely had to buy an appliance, plug it into the network and spin up the desired VMs. If an enterprise required more compute/storage, they simply added another appliance (i.e., node) to the mix for more IOPS and storage capacity.
The appliances self-managed many configuration and housekeeping activities, significantly reducing the amount of administrative overhead required. So, for the first time, IT could focus the bulk of its attention on the VMs running critical business applications instead of down in the nuts and bolts of the hardware and software components.
Although hyper-convergence marked a tremendous leap ahead in productivity and efficiency, it did not address some nagging storage issues.
Secondary storage -- hyper-convergence's orphaned child
The shift in focus from discrete hardware storage elements to commoditized, clustered and virtualized storage architectures almost exclusively benefited primary storage. This left legacy secondary storage alone to deal with its own endemic issues:
- Siloed use cases such as data protection and DevOps that required separate point products.
- Huge amounts of little-used, poorly understood and rapidly growing dark data.
- Copy data proliferation that wasted storage space and added to the dark data problem.
As much as 90% of an organization's data resides in secondary storage and, given its nature as a catchall, a lot of that data is not heavily used or well understood -- i.e., the dark data. So not only does the underlying physical storage need to be abstracted and scaled, there's a huge volume of opaque, redundant data that must be organized for use by all non-tier-1 workloads residing in secondary storage while simultaneously providing effective data protection for primary storage.
Addressing the secondary-storage problem
To take the next step in hyper-convergence, you would start with the clustered hardware technology residing beneath a global, extensible file system that current hyper-converged platforms provide, and you would enhance it to support the concurrent mixed workloads driven by secondary storage use cases. With this as a base, you would then build in global, inline data deduplication, file indexing and search services, integrating the key secondary storage workflows on top of it all. The finishing touch would be to add a cloud interface that allows bulk data to be archived there, to enable automated policy-based cloud tiering and provide a facility for off-site replication for business continuity and disaster recovery.
Sensing a market opportunity, a few vendors -- most notably Cohesity and Rubrik -- have introduced products to do just that, thereby creating a new category of storage: hyper-converged secondary storage. These new hyper-converged secondary storage products directly address the problems mentioned above and, to varying degrees, integrate data protection, DevOps and analytics:
- Data protection for data in primary storage is the most important secondary storage use case, so it must be present in any hyper-converged platforms.
- DevOps enables developers to use virtual, zero-space copies instead of fully duplicated data for the rapid and efficient deployment of new test and development environments, maximizing storage efficiency and minimizing copy proliferation
- Analytics provide a mechanism to light up dark data and increase its business value. For example, in addition to providing basic analytics as part of its base product, Cohesity offers a software developer's kit and an open integration platform that allow customers and third-party developers to add their own custom analytics modules.
The net result is that, with only one system to manage, customers no longer have to deal with separate point hardware/software products for the various secondary storage use cases. They also gain built-in copy data management efficiencies, significant improvements in storage capacity, less duplicate data proliferation and much better insight into the bulk of their data.
Other vendors enter hyper-converged secondary space
Besides Cohesity and Rubrik, the two purest plays in this new category, there are a few other vendors encroaching on the hyper-converged secondary storage space. Actifio started out as a copy data management company, but has expanded its offerings to what it calls copy data virtualization, which pulls in the secondary storage use cases of data protection and DevOps. Another vendor, SimpliVity, is positioned as a player in hyper-converged primary storage platforms, but they've tightly integrated in-line deduplication, zero-space copies and data protection into their offering, blurring the line between primary and secondary storage to a degree.
In summary, the secondary storage space, which has been a fragmented, mostly neglected IT backwater for a long time, is beginning to get some overdue attention. Leveraging existing technologies developed for the original hyper-converged platforms, a handful of vendors are producing some interesting products that promise to make life easier for storage administrators.
Flash storage boosts hyper-converged performance
Can you customize hyper-converged systems?
Which is right for you, convergence or hyper-convergence?