Dragon Slayer Consulting
Published: 06 Jul 2017
The proliferation of hyper-converged storage vendors, products and claims has made it difficult for IT shops to discern a proper fit for their organization. Some vendors gloss over performance and potentially increase the risk of storage chokepoints, while others myopically focus on performance and ignore server resource-intensive services, such as data reduction or protection.
It's important to address performance, storage networking, data reduction, data protection, security, scalability, hardware requirements and other issues before committing to hyper-converged infrastructure (HCI). And make no mistake, it is a big commitment.
Hyper-converged infrastructure appliances deliver easy-to-use, turnkey systems built on commercial, off-the-shelf white box servers and software-defined storage, compute and networking. The hardware consists mainly of x86 CPUs with hard disk or solid-state drives for storage and network interface cards (NICs). Networking for hyper-converged infrastructure appliances is usually a virtual top-of-rack or leaf switch. Virtualization is a key part of all HCI, including hypervisors and built-in containers.
In this article, we focus on hyper-converged storage -- the software-defined storage inside HCI -- and how that works with other pieces of the infrastructure.
Hyper-converged storage origins
Software-defined storage has no standard technical specification, so just about any storage software or system can justify SDS claims. To differentiate between SDS on storage systems or appliances and SDS running in a hyper-converged infrastructure, vendors adopted the term hyper-converged storage. It can be run in the hypervisor kernel, as a virtual machine (VM) or even as a container, and typically in every physical host. It virtualizes server-embedded SSDs and HDDs to create virtual storage pools.
Embedded storage media in HCI server nodes appear as an easy-to-use shared storage system to hypervisors, VMs, containers and applications. The storage pools can then be carved up as performance, capacity, read cache, write cache, hybrid, secondary or archive pools, and more. And since hyper-converged storage software implementations aren't all the same, vendor architecture choices have serious implications.
The next few sections take a deep look at some of the more important pieces of the hyper-converged storage puzzle. (Several storage systems that use the term hyper-converged storage are variations of scale-out storage. This article focuses on hyper-converged storage software, not hardware.)
1. Hyper-converged storage efficiency
Hyper-converged storage efficiency has four crucial focal points: capacity, software, data protection and performance. Most people assume the primary efficiency factor is capacity. It isn't.
Capacity efficiency: Hyper-converged storage minimizes storage consumption. It includes data reduction technologies, such as inline deduplication, compression, zero capacity snapshots or clones and thin provisioning. As a result, you need less storage capacity and supporting infrastructure. That infrastructure includes nodes, network ports, switches, cables, rack units, transceivers, floor space, power, cooling and management. These reductions lower capital and operating expenditures.
Software efficiency: This is the most critical issue for hyper-converged storage efficiency. In this context, software efficiency is measured by how many host resources it consumes. Host resources, such as CPU, cores, memory and I/O, are precious and primarily reserved for application workloads. Hyper-converged storage software is rarely designed with host resource efficiency as a priority, and much of it suffers from feature bloat.
Deduplication, compression, tiering, snapshots, thin provisioning, metadata management, RAID and so on are great storage management features, but they are compute, memory and I/O intensive. The underlying assumption has always been that Moore's Law would bail vendors out. Unfortunately, quantum physics limitations have shown that Moore's Law is not really a "law" and can no longer be counted on to overcome hyper-converged storage bloatware.
With the shift from HDDs to fast SSDs, the chokepoint of hyper-converged storage software has become a major problem. Some of this software is so inefficient that it must run on its own hardware because it consumes too many resources for co-resident application workloads to run effectively.
Data protection efficiency: This parameter guards against media, node and site failure; human error; maliciousness; and malware while minimizing storage consumption. Traditional hyper-converged storage media protection includes parity RAID 0, 1, 5, 6, 50 and 60, which is inadequate to insulate against hyper-converged storage node failures today. Most hyper-converged storage now forgoes RAID in favor of multicopy mirroring -- aka triple mirroring -- between nodes, maintaining one copy of data for each concurrent node or drive failure. For example, protection against three concurrent node or drive failures would require the placement of three copies of the original data on different nodes and drives.
Although multicopy mirroring provides continuous access to data even when there are multiple concurrent failures, it's quite inefficient. Each copy represents 100% more capacity consumption, and two copies mean three-times more storage capacity is consumed. This has led some hyper-converged storage vendors to implement object storage-type erasure coding for both primary and secondary storage. Erasure coding traditionally was relegated to object storage because of high latencies associated with processing writes and reads. When hyper-converged storage software is written in a highly efficient manner, it can deliver object storage-like erasure coding without adding noticeable latency. This approach empowers exceptional data durability of as much as 11 nines, with far more efficiency than multicopy mirroring at a nominal increase in storage capacity of 30% to 60%.
Hyper-converged storage uses snapshots and clones to protect against human error, maliciousness and malware. To minimize capacity consumption, snapshots are a variation of redirect on write or thin-provisioned copy on write. Efficient hyper-converged storage data protection has no limit on the number of snapshots or clones. Limitations affect recovery point objectives (RPOs) or the amount of data that can be lost in an outage. Fewer snapshots equal greater time between snapshots and higher RPO.
Hyper-converged storage data protection at the site level commonly uses asynchronous replication; less frequently, isochronous replication or write-on asynchronous replication; and, least frequently, synchronous replication or synchronous mirroring. Here, efficiency is measured by minimizing the amount of bandwidth required between geographic locations. Replicating only the incremental data changes or combining them with deduplication and compression can reduce bandwidth requirements. The most efficient hyper-converged storage software also does some WAN optimization, such as User-based Data Transfer Protocol over User Datagram Protocol (see "What is UDT over UDP?").
Performance efficiency: The amount of usable IOPS and throughput from each SSD or HDD measures performance efficiency. The more IOPS and throughput, the better -- an increasing concern as faster interfaces such as nonvolatile memory express (NVMe) become standard and individual SSD capacities increase. Hyper-converged storage software gets more performance on average from every SSD, so fewer SSDs are required, reducing the need for nodes, storage shelves, network ports, switches, cables, transceivers, rack units, floor space, power, cooling and more.
By speeding writes to the storage media, highly efficient hyper-converged storage software may not require dynamic RAM caching for SSDs or flash caching for HDDs. Generating similar performance while eliminating caching removes the need for cache coherency between hyper-converged storage nodes -- dynamic RAM (DRAM) caching. It also eliminates complicated and expensive life support -- such as nonvolatile DIMMS, uninterruptible power supply, batteries or supercapacitors -- for that DRAM. Eliminating DRAM or flash caching saves capital and operating expenditures without sacrificing performance.
What is UDT over UDP?
UDP-based Data Transfer Protocol is a reliable User Datagram Protocol distributed data-intensive application data transport protocol for wide area high-speed networks. UDT uses UDP to transfer bulk data with its own reliability control and congestion control mechanisms. It transfers data at much higher speeds than TCP/IP and is a highly configurable framework that can accommodate various congestion control algorithms as well. It's frequently used as part of WAN optimization.
2. Hyper-converged storage accessibility
Hyper-converged storage accessibility is often limited to hosts or nodes, VMs, applications and containers in a hyper-converged storage cluster. These resources are in a closed loop and cannot be shared by non-hyper-converged storage cluster workloads. There is considerable value in opening up hyper-converged storage to those workloads. It can eliminate the need for shared storage system silos, resulting in time, hardware, supporting infrastructure, management, operations, troubleshooting and cost savings.
Storage protocol support is another important aspect of hyper-converged storage accessibility. Some hyper-converged infrastructure appliances are iSCSI only. Others are based on file protocols such as NFS, SMB or the Hadoop Distributed File System. Several offer object storage support with RESTful interfaces. A handful of products provide all or most of block, file and object storage protocols. A select few are focused principally on NVMe (SCSI) over fabric, leveraging Remote Direct Memory Access (RDMA). Storage protocols have major implications for infrastructure, bandwidth, networks, switches, adapters or network interface cards, cables and transceivers. For example: RDMA may require InfiniBand; Fibre Channel (FC); Data Center Bridging Exchange Ethernet; or RDMA over Converged Ethernet v2 on TCP/IP 10 Gb, 40 Gb or 25/50/100 Gb Ethernet. InfiniBand, FC and DCBx Ethernet require unique switched networks.
When selecting hyper-converged storage, be sure it supports the organization's ecosystem fabrics, storage protocols and current or planned infrastructure, and stays within budget constraints.
3. Hyper-converged storage performance
Performance is the most manipulated, miscommunicated and misunderstood parameter. When a hyper-converged infrastructure appliances vendor specifies IOPS or throughput, what does that mean? It's obviously based on a specific hardware configuration, but what are the performance parameters? What was the test run to produce that IOPS number? Does it have any resemblance to the workloads that will be running on the hyper-converged storage in production? Odds are, it doesn't. The same thing can be said for throughput. There are standard tests, such as the Storage Performance Council's SPC-1 and SPC-2, and the Standard Performance Evaluation Corp.'s SPEC SFS, which have limited potential in comparing apples to apples. However, many of those standard results are based on configurations that are not implemented or are impracticable. The only way to effectively compare performance as required is either to do a bakeoff with production applications and data or utilize a simulator (see "Simulating HCI storage performance").
Simulating HCI storage performance
It's not rocket science to construct simulators to test hyper-converged storage performance and throughput via scripts. Scripts can take a lot of effort and time to write, document, as well as support. Yet they do have the advantage of no license fees. It's easier and faster -- although costlier -- to use third-party software, such as Load DynamiX from Virtual Instruments, to put hyper-converged storage through its paces.
Some hyper-converged storage software is laser-focused on performance. These speed demons use the latest performance storage technologies such as NVMe, RDMA and NVMe over fabric. Or they cleverly utilize unused cores in each hyper-converged storage node. One even uses parallel I/O. Another leverages nodal DRAM to accelerate reads and writes. Ultimately, what matters is that the hyper-converged storage meets the organization's current and future application workload requirements. And there may be tradeoffs with that excellent performance. It may forgo resource-intensive applications, such as data reduction and protection. It may also consume too many system resources, as discussed previously, and be relegated to its own hardware or require additional highly specialized and expensive hardware.
Additional hyper-converged storage considerations
Other often-overlooked considerations when evaluating hyper-converged storage software are the supported hardware requirements, analytics, management and cost.
Hardware requirements: All software requires hardware to run, and each has different hardware requirements. Some products require more processing or more cores. Others require a minimum amount of DRAM, NVMe flash, high-speed adapters and switches, and batteries or other life support for volatile memory. All have supported reference designs.
Analytics: A few offerings now include storage and predictive analytics similar to shared storage systems. Storage analytics track and report on consumption, performance and potential hot spots, as well as troubleshooting assistance. Predictive analytics provides problem alerts -- and information about them based on trend analysis -- reporting problems before they become bigger issues.
Software management: While most traditional storage systems likely have antiquated user interfaces, the most common hyper-converged storage interface is an intuitive HTML5 web-based GUI that requires little to no training. Not all are great in the management department, however, while several lack a command-line interface.
Cost: Calculating hyper-converged software costs or total cost of ownership starts with software license, subscription or perpetual license plus maintenance fees for the calculation period of three, four or five years. The next piece is the hardware required to run and support the software. Recognizing that the hardware is shared with hypervisors, VMs, containers and applications, it requires allocating a percentage of the shared hardware based on hyper-converged storage overhead or resource consumption. There are also hardware components that may be dedicated to hyper-converged storage; supporting infrastructure, such as adapters, NICs, switches and cables; plus maintenance and operating costs of that hardware -- including power, cooling, and allocation of personnel and fixed overhead -- for the calculation period.
Buying storage for hyper-converged infrastructure appliances is no trivial task and not for the faint of heart. Sifting through vendor claims and counter-claims is a laborious, tedious process that takes thorough research and testing. Following the guidelines described in this article and using the hyper-converged storage checklist provided (see "Rate hyper-converged storage software options") is a good place to start.
Hyper-convergence is offering new options in management of network systems, but engineers need to consider the flexibility of management software and performance to ensure best results.
Make the case for hyper-converged technology
Hyper-converged infrastructure features that make a difference
When to consider software-only, hyper-converged technology