Evolution of Containers

Raj Samuel
6 min readFeb 24, 2018

This post explains containers from an evolutionary perspective. Without knowing how containers came to be, I personally have trouble taking its definition (that’s everywhere) and then putting that piece of puzzle in the big picture. I don’t write code using containers so this post is not about the semantics of specific implementations like Docker or Google containers. You’ll read less about containers themselves and more about it’s primitives and evolution.

Containers is a fancy name for Operating System virtualization. It’s imperative that containers cannot be understood without poking around at the Linux kernel.

The OS traditionally looks at a computer as one machine (which it is) with x number of CPU cores, x amount of memory and disks, x number of file mounts and sizes, x number of ports. By using certain kernel primitives (we will see what kinds), now OS can divvy up x into say x/5 number of independent user environments without stepping on each other.

How is this different from what is today an old topic, virtualization? VMware, Citrix, Amazon Web Services have all been thriving on virtualization.

Virtualization

As Moore’s Law holds, CPUs get more powerful and memory cheaper. Yet the nature of software applications doesn’t require CPU (and sometimes memory) to be utilized to the full extend all the time. There are opportunities within a single computer system to lend processing power because of CPU idle time. SETI project at UC Berkeley or bitcoin mining by utilizing (sometimes stealing) other’s computers is an example of this. Consequently, there lied a technical intrigue and a business opportunity. That’s how virtualization is born.

Virtualization adds a layer of software on top of the kernel (or on a full operating system), called hypervisor. The hypervisor thus sits on top of what’s known as the host operating system and provides an abstraction of the hardware underneath, such that it virtualize the hardware into multiple chunks. You can install separate operating systems into each of these hardware-abstracted chunks. Each operating system sitting on top of hypervisor thinks it has it’s own machine, but in fact it’s a virtualized machine exposed by the hypervisor. Each such operating system is called a guest operating system.

Type 1 and Type 2

Having multiple guest operating systems run on a host operating system (with hypervisor in between) is a rather heavy-weight technique. These hypervisors (known as type 2 hypervisors) enabled SysAdmins to install hypervisors and share resources. That’s changing because servers are no longer managed on premises.

The world is moving to cloud, and virtualization is a just commodity that cloud providers use to virtualize their server farms. What’s more often used now is a hypervisor that runs on bare metal without needing a host operating system, known as Type 1 hypervisor. This is lighter, but harder to implement without a kernel’s system calls to rely on. From a hardware-abstraction perspective, Type 1 hypervisor is the real kernel (a micro-kernel). Note, each of the guest OSs are still full OSs with kernel and everything else around it. VMware is Type 1. So is Amazon EC2 (in late 2017 Amazon said its replacing it’s Type 1 Xen hypervisor with something in-house).

The Linux kernel primitives that hypervisors employ in order to provide virtualized independent environments are the same as what containers use to provide containerization.

We will shortly see how containers are different from virtualization. As with many other inevitable technologies today(email, time-sharing systems, Internet), containers evolved from the Unix system. Since Linux is the major flavor of Unix that thrives today, containers are very much tied to evolution of Linux kernel’s resource sharing and isolation. It’s fair to say that Google helped push this forward to get it to the current level of maturity.

Evolution of containers is the evolution of Unix resource sharing

Let’s first explore a few things in the Linux kernel that will help us understand containers.

The kernel is an abstraction of hardware that allows applications/users of a computer to use hardware. The scheduler in kernel allocates hardware resources to processes. The scheduler is as best as it can get for scheduling resources without contention, locks or bias. But what if an application requires bias? For example, an untested or an untrusted program that needs to be tested and executed in a closed environment without access to the rest of the larger system. This is why chroot came along in Unix.

chroot

chroot() system call was introduced in the Unix kernel in 1979. This system call is wrapped in a shell command of the same name chroot. It enables a root user to assign a sub-directory tree on file system to a process, and assign the sub-directory as the system root for that process. The process won’t have access to other resources outside of this assumed root, probably with no direct access to device files etc. unless the process has been assigned with those by the admin.

jailbreak

In 2000’s there has been a number of popular hacking attempts on systems where chroot was used to jail a process for security or isolation, thus popularizing the term jailbreak.

namespace

Namespaces in Linux kernel, like in many other avenues, are a way to isolate resource primitives (between processes). A resource can be processes, users, networks, CPUs etc. and their primitives are PIDs, UIDs, Port numbers, cgroups etc. (More on cgroups later).

For instance, if a process is assigned a namespace having port# 80 for HTTP transfer, all processes within that namespace can see the data on port# 80, but no other processes in the system. Other processes in a second namespace can still use this global resource (port# 80) but are isolated from processes in the first namespace, so that they don’t step on each other yet see it as their own.

cpusets

cpuset is the classic Unixy way of accessing system resources using a file system interface. cpusets confines logical CPUs and memory (RAM) to processes. CPUs in market today are multi-core and hyperthreaded so creating multiple cpusets for processes is inherently supported.

cgroups

In early 2000’s Google was trying to virtualize their servers to handle the immense data processing volume (the time Jeff Dean and Sanjay Ghemavat wrote the world’s first big data system at Google on top of GFS). Paul Menage, another engineer at Google took cpusets and wrote a few lines of code around it to implement containerization of application code. The idea was to divvy up CPU and memory into isolated resource containers and assign individual processes to each chunk without ever stepping on each other. You can possibly add as many copies of the same application (process) as you have containers .

The team at Google, working alongside Linux kernel maintenance group, eventually named what they used to call containers (along with the library utilizing cpusets to isolate the containers), control groups or cgroups. cgroups made their way into Linux kernel and went through a few design changes by the kernel group.

The main redesign was to control the cgroup process hierarchy (implemented as a tree of cgroup processes) as a single system-wide process hierarchy (as opposed to Google’s design of separate process hierarchies assigned to each resource controller). The single hierarchy would then be managed exclusively by a kernel process called systemd. This made certain operations logically clearer, for example when a process has to be switched from one cgroup container to another.

The culmination of all this work led to what is today known as Linux Containers or LXC.

Docker, and the container revolution

Note that cpuset is a resource confinement primitive, so this is a good candidate for isolation. So are namespaces. cgroup provides a library for process orchestration.

Moment of truth. A few years ago, the authors of Docker took cgroup and namespace libraries then available in the Linux kernel, and were able to package it into a library that can maintain a process and it’s state as an image of that process. Now application developers can put a component of their application, likely a RESTful service, into this process container. This container is

  • reusable,
  • horizontally scalable (scale out by connecting additional cheap commodity computers)
  • can hold the same service (application code) in as many containers as needed
  • can easily replicate application changes across all containers holding a copy

For a world that go online for products and services offered by businesses and governments that are implemented using micro-services and SOA-based services, the inevitable advantage of containers is multi-tenancy. You can deploy as many copies of a RESTful service on as many containers as you want. These containers can be deployed in a single physical server, or on multiple physical servers, or on virtualized servers. In other words containers are horizontally scalable.

In 2014, Joe Beda of Google who pioneered Kubernetes announced that Google started 2 billion containers every 2 weeks. Not surprising considering the scale at which Google operates. Kubernetes by the way is a framework to orchestrate containers when you have to scale them up to Google-scale or similar scale.

From an industry adoption perspective there seems to be a consensus among all cloud providers towards Docker and Kubernetes although there are other less popular options. Containers and virtualization doesn’t contradict. Containers benefit application developers in deploying truly multi-tenant and horizontally scalable apps. Virtualization benefits IaaS providers and on-prem SysAdmins to divvy up hardware resources.

--

--

Raj Samuel

I write because I forget. (PS: if you take what I wrote and post it as your own please try not to edit it and post rubbish. CTRL+C, CTRL+V is your friend.)