3dmentat - Fotolia
- Jon Toigo, Toigo Partners International
Many IT managers look at data growth rates with trepidation. For years, larger shops have hosted files and objects on dedicated NAS platforms, using the latest technologies from vendors such as NetApp and Dell EMC Isilon to handle the growth. However, traditional NAS appliances aren't scalable enough to consolidate proliferating secondary storage workloads, including file and object, backup and archive, testing and development, and analytics. These appliances are fast becoming silos that complicate data sharing and make infrastructure management a Herculean undertaking.
We need a replacement architecture that delivers scalability, reduces the management burden, bends the capacity-demand curve to slow the need for capacity acquisitions and improves workload and data productivity. This is the thinking behind Mohit Aron's latest storage venture, Cohesity and its hyper-converged secondary storage.
The problem with NAS
Data has grown at an accelerated rate over the last few years and is expected to continue to do so for the foreseeable future. Unstructured data, both files and objects, is a significant contributor to this growth and is usually directed to NAS or other file server configuration storage silos. There are many problems that mostly center on complexity and cost.
With NAS platforms, vertical scalability is limited. You can only fill the crate with so many hard drives or SSDs. NAS is essentially a thin server bolted to an array, usually with an expensive memory buffer on a feature card to spoof applications so they don't see any latency in write operations. Once all the slots are filled, your only choice is to deploy more appliances and scale horizontally.
In addition to hardware, horizontal scaling requires more thin server software licenses. Moreover, to build a coherent, expanded silo, you may also need software to help the individual appliances work in a group. And specialty software may be needed to establish and maintain a global namespace over pooled hardware so you can see files in a one-throat-to-choke manner. There are many shortcomings to this strategy, with several ways to break the storage, including corrupted namespaces, pooled hardware failures and failover errors. It also has a cost model that grows unpredictably, because of differences in storage architecture, topology and protocols over time.
An alternative architecture
Cohesity started as an effort to solve some of the woes of backup, said Aron, the company's CEO. The goal was to establish a secondary storage platform that could support unlimited web-scale capacity -- both on premises and in clouds -- without a lot of administrative or managerial intervention. A fan of deduplication, Aron wanted to make this and other functionality available across the secondary storage platform rather than as value-add software on individual appliances or arrays.
Aron is a veteran of Nutanix, which did something similar with primary storage, using software-defined storage and hyper-converged infrastructure technology and architecture to provide a platform for mission-critical workloads. As a developer of the Google File System, Aron had the right stuff to create Cohesity's hyper-converged secondary storage platform.
The company touts the benefits of hyper-converged secondary storage for various applications, including file and object storage. And the platform does seem to be an improvement over traditional NAS in the following ways:
- Hyper-convergence. Cohesity's platform is hyper-converged, meaning you're dealing with a clustered storage node model that's managed as a single set of resources to host all data from secondary storage workloads that are fragmented and siloed in traditional NAS storage models. This simplifies both the management of the infrastructure and the access to data stored there.
- Web-scale. Cohesity hyper-converged secondary storage features unlimited scalability with universal deduplication. It's among the first with global, variable, block-level data reduction. Start as small as three nodes and scale up at will without a lot of hassle.
- Productivity. You can share data among workloads directly rather than requiring data-copy sharing. Global indexing and searching helps you find the data you need, and in-place analytics supports queries and trend analysis, while also improving the ability of different workloads to use data.
- Multi-cloud. Cohesity's hyper-converged secondary storage can span from the data center's edge to multiple public and private clouds to capitalize on cloud elasticity and economics, while using data wherever it's placed in the infrastructure.
Checking all the right boxes
Cohesity is proud of its SpanFS distributed file system, the root of the secondary storage platform's advanced functionality. The company's consistency guarantee, where its hyper-converged secondary storage writes to multiple nodes before acknowledging a write, is particularly noteworthy. This approach is different from NAS products, which deliver eventually consistent writes, where data is written to a cache until it finally writes to the back-end storage and can result in data loss if certain interruption events occur.
Add fully linear performance scaling and a bunch of data protection features -- such as erasure coding, including ratios of 2-to-1, 4-to-2 and 5-to-2 and replication factor -- to the functionality story, and it appears Cohesity's file and object storage platform checks all the right tech boxes. Plus, the product is sold on a pay-as-you-grow basis, eliminating the forced forklift upgrade and warranty renewal models of NAS vendors.
It's clear, out-of-the-box thinking is required to cope with the looming unstructured data deluge. Rather than bolting on additional features or functions to an appliance model, Cohesity's approach is worth a look.