The tech world was rocked just days into the new year when Google's Project Zero team revealed the security flaws...
known as the Spectre and Meltdown vulnerabilities. These vulnerabilities in Intel, AMD and ARM chips take advantage of a processor design flaw to allow a user to access memory belonging to the kernel or another process.
Spectre and Meltdown have found ways to use the speculative execution feature of modern processors to bypass a boundary check, force a branch or load data from another processes cache. Speculative execution is a feature of modern processors that predicts how an application will branch and executes code before it's needed.
Servers, especially virtual desktop infrastructure servers, need hypervisor, OS and processor microcode patches to close some of the holes and keep data safe from rogue processes. The good news is that everyone on the compute side of the house responded quickly. Linux developers were already working on a solution, kernel page-table isolation. Microsoft and VMware also released patches, although Microsoft stopped shipping its original patches for Advanced Micro Devices (AMD) when microcode updates reportedly bricked some PCs. Microsoft later released manual patches.
Patch reactions vary to Spectre and Meltdown
I was worried about how small users that may run older systems without maintenance contracts would get the microcode patches. Some vendors only publish the BIOS updates that I thought would be needed for their last generation or two of servers. Others, such as Hewlett Packard Enterprise, only release BIOS and other firmware patches to customers with paid maintenance. I was glad that VMware's patches also apply the needed microcode updates.
The problem is that those patches also add latency to every context switch between user and kernel space and, generally, will slow those servers down 10% to 30%. This latency will have the biggest impact on applications such as database engines that do a lot of storage I/O.
As a storage guy, the bigger question is: How will the Spectre and Meltdown vulnerabilities affect my storage system? Followed quickly by: How is my storage performance going to be affected by a 10% to 30% loss in CPU horsepower available?
Spectre and Meltdown vulnerabilities can hit hyper-convergence
It seems to me -- and just about everyone else I've talked to -- that hyper-converged and other software-defined storage products running on hosts with user installed application processes must be treated like servers and patched. For hyper-converged systems that run the storage process as a virtual machine, that means patching the host, hypervisor and storage/hyper-converged infrastructure VMs to protect them from user code. These systems will probably take a significant performance hit as storage I/O passes through the VM to the underlying hardware.
There's more debate about dedicated storage appliances that use x86 or other vulnerable processors. NetApp and Tintri said they wouldn't be issuing Meltdown and Spectre patches because their systems were unaffected, despite calls for them to do so.
The way I look at it, Spectre and Meltdown vulnerabilities allow one process on a CPU to access memory belonging to another process on that CPU. If my storage vendor keeps tight control on the processes running on that CPU, do I really need to reinforce the walls between those processes? I don't think so, and I don't think it's worth the performance hit to patch the walls that no one has an opportunity to breach.
Of course, this all changes when the storage vendor allows users to run containers or other applications on the storage appliance's processor as Coho Data, Pure Storage and others do. Once users are allowed to run any code, the system in question stops being a sealed appliance and becomes a server whose kernel is vulnerable to a rogue process from a user application.
That same analysis would apply to other appliances. If the appliance, such as a load balancer or spam filter, only runs code directly from the appliance vendor, I wouldn't worry about side-channel attacks from rogue processes. But if the appliance could run arbitrary code, like say a Synology NAS or Windows Storage Server box, then protections against Spectre and Meltdown are required.
Eventually, Intel, AMD and the rest will fix the underlying problem, and we'll all be able to rest easy. Of course, even if the next generation of processors solves the problem in 12 to 18 months, it will be another five years or more before our data centers are fully upgraded to secure CPUs. Some have joked, or hinted, that this is all a plot by Intel to slow our systems down and sell new, faster processors. While I don't usually buy into conspiracy theories, the dip in server performance the current round of patches created will end up selling some more processors.