Today, many hyper-converged storage vendors have snapshot capabilities built into their software. But is the combination of data copies on multiple nodes and rich snapshots enough justification to ditch your backup software? Some hyper-converged storage vendors claim it is.
The problem with snapshots is they are totally dependent on the data on the snapshot volume to work. That means if there is a catastrophic data loss beyond the RAID or node replication number, all the data could be lost.
Key backup capabilities to look for
Some vendors will leverage deduplication and actually copy the data to avoid using snapshots for backup. Of course, if you have deduplication and make a copy of the same data on the same volume, nothing is actually copied. Instead pointers are updated in a metadata table. This is very similar to the way a snapshot would work.
If you are going to count on deduplicated copies or snapshots to provide data protection, then you are going to want to verify that your hyper-converged infrastructure can provide three key capabilities:
- Hyper-converged storage vendors should be able to replicate data to another cluster in another location. The location can be different nodes that are part of the same cluster or a different cluster altogether. Some hyper-converged storage vendors offer a version of their infrastructure hosted at a cloud provider like Amazon or Google. It is critical that an organization be able to get its data out of the building to protect against a worst-case disaster.
- The vendor should go to great lengths to make sure the metadata tracking deduplication or snapshot information is protected. Essentially, if this data is lost, there is no way to access it. Protection should include copies of the metadata on multiple nodes and on the replicated site. There should also be a way to verify the integrity of the metadata table on a regular or continuous basis.
- Finally, there should be no performance impact to maintaining these deduplicated copies or snapshots for an extended period of time. You don't want your data protection technique to impact production performance. This is an area where deduplication probably has an advantage, since it is dependent on the metadata index for its information and not on the original data.
Decide on a backup alternative
Assuming snapshots or deduplication don't impact performance, the ability to replicate data remotely and the ability to protect the metadata table, hyper-converged storage vendors have the potential to eliminate the need for backup. However, there are long-term cost concerns that should be kept in mind.
Both deduplication and snapshots are space efficient, but consume capacity as data changes. Plus, there is likely a point in time when data no longer needs to be stored on the production hyper-converged storage. A hyper-converged infrastructure paired with a strong archiving product hosted on an object storage system would deliver the best of both worlds and could provide a suitable backup alternative.
Hyper-convergence could play a big role in backup strategy
Sizing up the hyper-converged vendor competition
Data protection is a key component for hyper-converged vendors