Containerization is something that's actually been around for years, but has seemingly burst onto the scene in a massive way over the last few months. Sun had an implementation of the technology and Linux has long had LXC. So what's changed that you can't seem to talk to anyone in next-gen infrastructure without the topic of containers coming up?
In a word - Docker happened. Docker was an outshoot of a company looking to make a next-generation PaaS - dotCloud. While the PaaS was impressive, it was enabled by a core technology they had developed - Docker. It turns out that Docker has become far more popular than their PaaS ever did. Docker itself is a set of technologies and wrappers around the core LXC containerization capabilities that make them DEAD easy to use. By most accounts the LXC container technology was significantly hamstrung by its incredible complexity. Docker, on the other hand, makes LXC dead-simple to use for even a plebeian such as me.
Companies like Google and Facebook are building MASSIVE systems using Docker, so it seems that others would agree.
What is Containerization?
I'll skip some of the nuances of Docker vs LXC vs SUN implementations and focus on Docker vs more traditional virtualization.
The key drawback with Virtualization is that it required the system to run multiple copies of the guest OS. Boot up parallels on your desktop Mac and you'll immediately see the performance hit and memory overhead of having to run an entire separate OS.
There are ways around this - for example using extremely lightweight host OS, or highly cut-down guest OS'es to lessen the overhead of the system, but there's only so far these optimizations can take you.
Containerization on the other hand shares a single host OS, and single kernel, but creates "containers" inside that OS that contain all of the software and environment variables / libraries for the particular software you would like to run.
Each container no only needs the overhead of the entire OS and the memory + CPU footprint that that entails, but it is also a sparse Diff version of a standard container. So if the new container only needs one library different from the standard image, the entire container will only contain the different library. Hence it's not uncommon to see containers for highly complex application installs running into just a few megabytes.
It's still unclear just how much memory + CPU + storage overhead a docker system will have vs. virtualization. And I'm sure that it depends on the particular workload. I have not yet seen any sizable studies comparing the two technologies in real-world workloads; but I expect them to be done soon.
Pros and Cons?
Aside from the obvious storage, memory and CPU efficiency gains described above (do you really want anything more) there are a few other key pros that you get for "free" using docker.
- Portability: That same docker container will run anywhere on any linux system on any cloud without any changes. It brings with it all of the requisite libraries, file versions and environment variables needed to set up its environment - so it truly is entirely portable
- Run Anything: If the software can run on the host, it can run in a container. If the Linux kernel can execute the code, it can run in a container. To the software itself it looks as it if it running on bare-metal. No tweaks for assumptions brought in through being in a guest OS
Just these two features together may be reason enough to use Containers. Given that it prevents the absolute nightmare scenarios I've seen before of dependency hell. Even when using a tool such as Chef or Puppet, debugging environment variable initializations and library dependencies can become essentially impossible.
BUT there are some key cons to using the system that have to be kept in mind. The primary one is isolation. In a virtualized environment there is a fare assumption of process and data isolation between the virtual machines. There are theoretical attacks that can hop through the hypervisor, but by-and-large what is executing one virtual machine is well isolated from the Host OS and the other Guest OS'es. We even see some large enterprises executing PCI and non-PCI workloads on the same physical boxes given the strong isolation that virtualization brings.
This isolation is not present in containerization. Given that all code in any container is still sharing the same memory and the same kernel, cross application communication is still possible, and malware or an attacker in one container should be assumed to have full access to all data and applications in the other containers.
I'm sure that improved isolation is a key item on the Docker roadmap, but for now it is not a technology that should be used in sensitive environments.
Containers offer a tantalizing amount of opportunity for improved hardware utilization and devops efficiency; and hence could deliver massive bottom-line savings. There's still a lot of work to be done to make the system truly enterprise class with the full security and administration tools around them; but those will come in time.
For now - this is a technology you should be at least kicking the tires on Containers in your bleeding-edge R&D teams. There's too much potential upside here to miss this wave.