The evolution of data center storage architecture
A comprehensive collection of articles, videos and more, hand-picked by our editors
Storage services are a key part of any infrastructure, especially a converged architecture. Many of the bundled...
solutions use a traditional shared storage system for the storage part of the architecture. That requires inclusion of a storage network, but the complexity of a storage network is greatly reduced because of the pre-integration work done by the vendor.
Most of the integrated and all of the software-only converged systems run the storage services as part of the compute tier. The storage software aggregates the capacity within each node, which has the advantage of eliminating the cost and complexity of an additional storage controller. And these systems can use server-class storage media instead of enterprise-class hard disks and flash storage. Those two features combine to greatly reduce costs.
There are some issues to be aware of when storage services are run within the compute tier. These services typically run within a virtual machine (VM) construct, which means their level of activity could adversely impact other VMs within the cluster. A spike in I/O demand on a virtualized SQL Server application, for example, could cause a spike on the VMs running the storage software, which could result in contention on the I/O bus. Because each node provides storage, compute and storage I/O, some of the I/O issues should be mitigated, but there's still a legitimate concern about predictable performance.
This concern may be exacerbated by the fact that most integrated or software-only converged systems cannot leverage shared storage at all. In other words, data centers that are concerned about this lack of predictability have a limited ability to address it by standing up a dedicated converged storage silo. If predictable performance is a concern, IT planners would be well advised to look for a solution that can set up a dedicated storage tier and a converged storage tier.
How is data dispersed for sharing and RAID protection?
To support features like live migration, VMs require multiple hosts to have access to the same virtual disks. And, of course, VMs must be protected from a drive failure.
Again, since most bundled solutions use a traditional shared array, there's little concern about data protection. The incorporated arrays are typically enterprise-class and are built on RAID-based data protection.
Bundled and integrated solutions tend to take a different approach. They're tuned for their storage software, which typically runs in a scale-out fashion across the compute tier. It may take one of two forms. The first is a replication model where each VM is replicated in real time to one or two additional nodes. Most IT planners tend to choose three-way replication so that they're still in a protected state in the event of a node failure.
While replication is a simple and efficient technique, IT planners need to be aware that capacity consumption will increase by 3x under this model. Each write also is magnified by a factor of three, so it's critical that the network interconnecting these nodes be highly tuned.
Another option is to use a technique like erasure coding to protect data. Erasure coding requires less capacity overhead than replication, typically 30% vs. 3x. And because the size of the I/O is so small, it should perform better when writing data and when it's in a rebuild state. It does have the downside that usually every node has to be involved in every I/O operation, both reads and writes.
A final consideration relates to how the converged architecture delivers performance. For the bundled approaches, performance is via a shared storage device, so it's critical to ensure the storage network is properly configured and tuned.
Integrated or software-based converged infrastructures should have an advantage when it comes to performance. Since these systems run the storage software in the compute tier, storage I/O access -- particularly reads -- should be greatly increased. But how that plays out in reality is largely dependent on how the software is architected. If there is intelligence behind the placement of data, the software can be designed to ensure that each VM has a local copy of its data. This should be especially easy for systems that use replication for data protection, but it may not be possible for systems that use erasure coding.
About the author:
George Crump is president of Storage Switzerland, an IT analyst firm focused on storage and virtualization.
Which converged system best fits your computing environment?
Converged infrastructure systems can mitigate heterogeneous data center drawbacks
Vendors develop variations in converged architectures to gain market advantage