[For those of you directed here for my Blackhat 2008 presentation of the same name, the slides will be posted with a narrative shortly. This was the post that was the impetus for my preso.
If you'd like to see a "mini-me" video version of the presentation given right after my talk, check it out from Dark Reading here. You'll also notice this is quite different than Ellen Messmer's version of what I presented...]
--
I've written and re-written this post about 10 times, trying to make it simpler and more concise. It stems from my initial post on the matter of performance implications in virtualized security environments here.
After a convo with Ptacek today discussing the same for a related article he's writing, I think I've been able to boil it down to somewhere near its essence. It's still complex and unwieldy, but it's the best I can do for now.
Short of the notions I've discussed previously regarding instantiating the vSwitches into hardware and loading physical servers with accelerators and offloaders for security functions, there aren't a lot of people talking about this impending set of challenges or the solutions in the short or long term.
This should be cause for alarm.
These issues are nasty. Combined with the organizational issues of who actually owns and manages "security" in the virtualized context, this stuff makes me want to curl up in a fetal position.
So here they are, the nasty little surprises awaiting us all carried forth by the four horsemen of the virtualization security apocalypse named conquest, war, famine and death:
- Virtualized Security Screws the Capacity Planning Pooch (Conquest)
- The Network Is the Compu...oh, crap. Never mind, it's broken. (Death)
- Episode 7: Revenge of the UTM. Behold the vUTM! (War)
- Spinning VM straw into budgetary gold (Famine)
In order to ameliorate these shortcomings, we're going to have to see some seriously different approaches and rapid acceleration of solution roadmaps. There are some startups as well as established players all jockeying to solve one or more of these problems, but they're not going to tell you about them because, quite frankly, they are difficult to describe and may cause TPOW syndrome (Temporary Purchase Order Withholding.)
So here they are in all their splendor. The gifts of the four horsemen, just in time to pour salt in your virtualized wounds:
- Virtualized Security Screws the Capacity Planning Pooch (Conquest)
If we look at today's most common implementation methodologies for deploying security in a virtualized environment, we end up recognizing that it comes down to two fundamental approaches: (a) install software/agents from the usual suspects in the VM's or (b) deploy security functions as virtual appliances (VA) within the physical host.
If we look at measuring performance overhead due to option (a) I wager we'd all have a reasonably easy time of measuring and calculating what the performance hit would be. Further, monitoring is accomplished with the tools we have today. This is a per-VM impact that can be modeled across physical hosts and in response to overall system load. No real problem here.
Now, if we look at option (b) which is the choice of almost all emerging solutions in the VirtSec space, the first horseman's steed just took a crap on Main street.
For example, let's say that we have one (or more -- see #2 and #3 below) monolithic security VA whose job it is is to secure all traffic to and from external sources to any VM in the physical host as well as all intra-VM traffic.
You see the problem, right? Setting aside the notion of how much memory/CPU to allocate to the VA so as not to drop packets due to overload, capacity planning completely depends upon the traffic levels, the number of VM's on the system (which can be dynamic,) the way the virtual and physical networks are configured (also dynamic) as well as the efficiency of the software/OS combo in the VA. Lest we forget access to system buses, hardware and the tax that comes with virtualizing these solutions.
The very real chance exists of either overrunning the VA and dropping packets which will lead to retransmissions, etc. or simply losing valuable landscape to add VM's because the "extra" CPU/memory you thought you had is now allocated to the security VA...
Measuring security VA performance is a crapshoot, too. Sure there's VMMark, but methinks that we already have enough crap floating about in how vendors measure performance of physical appliances whose resources they control. Can you imagine the first marketing campaigns that are sure to be launched on the first 10Gb/s virtual appliance...Oh my.
- The Network Is the Compu...oh, crap. Never mind, it's broken. (Death)
Virtualization offers some fantastic benefits, not the least of which is the capability to provide for resilience and on-demand scalability/high-availability. If a physical server is overloaded, one might automagically allow the VMotion of critical VM's to other lighter-loaded physical hosts. If a process/application/VM fails on one host, spin it back up somewhere else. Great stuff.
Except, we've got a real problem when we try to apply this dynamic portability to security applications running in VA's. Security applications are incredibly topology sensitive. For the most part, they expect the network configuration to remain static - interfaces, VLAN's, MAC addies, routes, IP addresses of protected nodes, etc. If you go moving security VA's around, they may no longer be inline with the assets they protect!
Further, the policies that define the ACL's that govern the disposition of traffic also don't grok.
But wait, there's more!
Replicating certain operating conditions within a virtualized environment is going to be tricky when the VirtServer admins don't have any idea of what VRRP and multicast MAC addies are (that the security applications depend upon) and how that might affect load balancing firewall cluster members within the same physical host. Mutliwhat?
An example might be that you want to implement high availability load balancing for a "cluster" of firewall VA's within a single physical host so that you don't have to VMotion an entire server's worth of VM's over to another if the security VA which is inline fails (we can address HA/LB across two physical hosts later.) It's going to be really interesting trying to replicate in a virtualized construct what we've spent years gluing together in the physical world: vSwitch behavior, port groups, NIC teaming, etc.
Lastly, I'm skipping ahead a little and treading on issue #3 below, but if one were to deploy multiple security VA's within a single physical host to provide the desired functionality across protected VM's, how does one ensure that traffic flow is appropriately delivered to the correct VA's at the correct time with the correct disposition reflected up and downstream?
There are some really difficult challenges to overcome when attempting to "combine" security functions in-line with one another. In fact, this concept is what gave birth to UTM -- combining multiple security functions into a single platform and optimize both the control effectiveness, simplify management and reduce cost.
Most UTM vendors on the market either write their own security stacks and integrate them, take open source code and/or OEM additional technologies to present what is marketed as a single "engine" against which traffic is cracked once and inspected based upon intelligent classification. Let's just take that at face value...and with a healthy grain of salt.
My last company, Crossbeam, took a different approach. Crossbeam provides a network and (security) application virtualization platform (the X-Series security service switch) and allows an operator to combine a number of discrete third party ISV security solutions in software in specific serialized and parallelized processing order based upon policy. You pick the firewall, IPS, AV, AS, URL filter, WAF, etc. of your choosing and virtualize those combinations of functions across your network as a service layer.
This is the same model I am trying to illustrate in the case of server virtualization with security VA's except that the Crossbeam example utilizes an external proprietary chassis solution.Here's an overly-simplified illustration of four security applications as deployed within an X-series: an IPS, IDS, firewall, web application firewall (WAF). These applications are instantiated once in the system and virtualized across the network segments connected to them governed by policy:
Note for the purpose of simplicity I'm showing a flow path from ingress to egress that is symmetrical.
Technically, egress flows could actually take a different path through other software stacks which makes the notion of "state" and how you define it (via the "network" or the "application") pretty darn important. I'm also leaving out the complexity of VLAN configurations in this example.
What's interesting here is that each of these applications can often be configured from a network perspective as a layer 2 or layer 3 "device," so how the networking is configured and expects to be presented with traffic, act on it, and potentially pass it on is really important. Ensuring that flows and state are appropriately directed to the correct security function and is presented in the correct "format" with low latency and high throughput is much easier said than done.
Can you imagine trying to do this in a virtualized instance on a server across multiple security VA's? There's really no control plane to effect this, no telemetry, and the vSwitch isn't really designed as a fabric to provide much more than layer 2 connectivity.
Fun for the entire family! Kid tested, virtualization approved! - Episode 7: Revenge of the UTM. Behold the vUTM! (War)
"The farce is strong with this one..." OK, so this is a dandy. The models today that talk about VA installations position the deployment of a single security vendor's VA solution. What that means is combined with the issues raised in points (1) and (2) above, we're sort of expecting to not embrace the best-of-breed approach and instead of deploying a CHKP firewall VA, an ISS IDP VA, a McAfee Anti-malware VA, etc., we'll just deploy a single vendor's monolithic security stack to service the entire physical host?
Does this model sound familiar? See #2 above. Well, either you're going to do that and realize that your security ultimately sucks harder than a Dyson or you're going to do the nasty and start to deploy multiple vendor's security VA's in the same physical host.
See the problem there? Horseman #3 reminds you of the points already raised above. You're going to be adding security VA's which takes away the capacity to add valuable VM's dynamically:
...and then you're going to have to deal with the issues in #2 above. Or, you'll just settle for "good enough" and deploy what amounts to a single UTM VA and be done with it. Until it runs out of steam or you get your butt handed to you on a plate when you're pwned.
You could plumb in a Crossbeam or even less complex single-vendor appliance solutions, but then you're going to find yourself playing ping-pong with traffic in and out of each VM, through the physical NICs, and in/out of the appliances. Latency is going to kill you. Saturation of the pipe is going to kill you. Your virtual server admin is going to kill you, especially since he won't have the foggiest idea of what the hell you're going on about.
Further, if you're thinking VMsafe's going to save you trouble in either #2 or #3, it ain't. VMsafe sets its hooks on a per VM basis and then redirects to a VA/VM within each physical host. It's settings in the first release are quite coarse and you can't make API calls outside of the physical hosts, so the "redirects" to external appliances won't work. Even if they did, there's no control plane to deal with the "serialization" I demonstrate above. - Spinning VM straw into budgetary gold (Famine)
By this point you probably recognize that you're going to be deploying the same old security software/agents to each VM and then adding at least one VA to each physical host, and probably more. Also, you're likely not going to do away with the hardware-based versions of these appliances on the physical networks.
That also means you're going to be adding additional monitoring points on the network and who is going to do that? The network team? The security team? The, gulp, virtual server admin team?
What does this mean? With all this consolidation, you're going to end up spending MORE on security in a virtualized world instead of less.
There is lots of effort going on to try to force-fit entire existing markets of solutions in order to squeeze a little more life out of investments made in products, but expect some serious pain in the short term because you're going to be dealing with all of this for the next couple of years for sure.
I hope this has opened your eyes to some of the challenges we're going to face moving forward.
Finally, let us solemnly remember that: