Lately I've been thinking a lot about the problem of software security - "lately" being the last 15 years of my life, give or take. It seems to be a topic that's perennially on the horizon, because only a few cutting-edge software companies take it seriously enough to engage in some kind of secure software development lifecycle. I think that we security practitioners have "screwed the pooch" with regards to software security - 'vulnerability researchers' have done a pretty fair job of convincing most vendors that it's useless to even try; whether you get targeted or not has more to do with whether you're unpopular or market-dominating than with whether your software is foundational. Where have we gone wrong? It's simple: we treated it as a security problem. It's also a reliability problem - a quality problem. We asked the users to demand security, but what they needed to be demanding about was software that worked. Not just sometimes, but all the time - even if someone is deliberately trying to make it crash.
Because security took the detail-oriented, vulnerability-centric approach to software, the industry has had its head down in the weeds fighting brush-fires instead of thinking in terms of strategy. It makes sense that it happened that way, because the community of 'vulnerability researchers' weren't able to apply pressure at the architectural level, where it was needed. We got a dawning awareness that software needs to be relatively free of obvious bugs, instead of that it needs to be architected to work and evolve based on sound design principles. What do I mean? For one thing, it seems that the software industry keeps conspiring to defeat good architecture whenever it encounters it. Take the operating system, for example: it exists to separate processes from each other, provide a set of common services to applications, and mediate device access for connected peripherals. Its ability to separate processes from each other (and keep them from getting into the operating system's own protected address space) is what security is. If the operating system can't protect its own address space, it may as well not be there at all (remember DOS?) and the consequences for reliability are guaranteed to be severe (remember DOS?) if one program's mistakes can trample another's memory. That's the mechanism whereby we get both blue screens of death and viruses. But a gaping hole "had to" be there so that dynamically loadable device drivers would work! Why even bother with an operating system that tries to protect process memory when it will absorb any device driver that comes along and needs access to the system bus? It's weird to me, because it looks like "two steps forward, two steps back" on endless repeat. We can sit around and worry about code bugs and vulnerabilities all day, but when our software reliability strategy is so self-defeating (or nonexistent) there is absolutely no point whatsoever in trying to duct tape over all the little holes. It's all one big hole!
Now let me get to the point: virtualization. I've been on the fence about virtualization for a long time, and now I finally understand why: it's not a strategic step in the right direction - it's an admission of defeat. Why do we need to virtualize machine images when that's what the job of an operating system was all along? Simple: operating systems have failed. If the process separation and memory protection properties of the operating system have been too easily and consistently bypassed, the option is to virtualize the whole hardware platform: thus you have a virtual memory operating system running in a virtual hardware environment hosted on top another virtual memory operating system. What was the old saying from the Bell Labs crew? "There is no problem in computing that cannot be solved by adding another level of indirection" or something like that. Let me predict something: in order to make some aspect of VM administration easier, the separation property that keeps VMs apart will get bypassed (for convenience) and VMs will become a convenient way for malcode to infect hundreds of machines instantly.
In short, I don't think we're on a strategic path that will give us measurable success in the software reliability game. Because we're still thinking in terms of building better system images based on 1970's designs with graphical user interfaces stuck on top of them. It is not "progress" that we've graduated from 'vi /etc/hosts' to having a mouse-based interface that runs a perl script that edits the hosts file. If we're going to have systems that work better (and are more secure) we need a complete re-invention of how we do system administration, how operating systems guarantee process separation, how run-time environments are managed and controlled, and should be entirely network-centric. Why do systems still have configuration files instead of querying the information on demand from an administrative hub? For that matter, why do we still "install" software rather than caching disposable executable images which are version/upgrade checked and drawn on demand from an administrative hub/software repository? I used to ask "it's 2010, why are we still running operating systems that get viruses?" but now I understand that the question should really be "it's 2010, why are we still reinventing MULTICS?"
I believe that eventually someone is going to crack the problems I described above, make some return on investment calculations against the cost of system administration + virus cleanup + downtime and our current operating environments will vanish about as fast as steam power did when diesel engines were invented. Perhaps it will happen in the environment of commodity "smart phones" (I'm sorry but having a telephone running a UNIX operating system is not a "smart" phone) where the cost of software management for millions of devices finally puts Darwinian pressure on our software installed base. Never mind cyberwar scare scenarios - I just don't want to have to hear all the whining if the iPhones stop working. The clarifying moment came last week at IANS when I quipped "if iPhones all went down, there would be congressional hearings and Steve Jobs would be standing there next to Mr. Toyota." I think that was an unintentional truth: we care about reliability.
Since I'm a "security guy", I'm still mentally absorbing all the news about "the Great China Cyberwar of 2010 That Didn't Actually Happen" and it seems to me that system administration may become a threat to national security if we don't figure out how to do it right in the next couple of decades.