Virtualization – Docker

Addressing Time Drift in Docker Desktop for Mac

David Scott — Mon, 25 Feb 2019 10:00:59 +0000

Docker Desktop for Mac runs the Docker engine and Linux containers in a helper LinuxKit VM since macOS doesn’t have native container support. The helper VM has its own internal clock, separate from the host’s clock. When the two clocks drift apart then suddenly commands which rely on the time, or on file timestamps, may start to behave differently. For example “make” will stop working properly across shared volumes (“docker run -v”) when the modification times on source files (typically written on the host) are older than the modification times on the binaries (typically written in the VM) even after the source files are changed. Time drift can be very frustrating as you can see by reading issues such as https://github.com/docker/for-mac/issues/2076.

Wait, doesn’t the VM have a (virtual) hardware Real Time Clock (RTC)?

When the helper VM boots the clocks are initially synchronised by an explicit invocation of “hwclock -s” which reads the virtual RTC in HyperKit. Unfortunately reading the RTC is a slow operation (both on physical hardware and virtual) so the Linux kernel builds its own internal clock on top of other sources of timing information, known as clocksources. The most reliable is usually the CPU Time Stamp Counter (“tsc”) clocksource which measures time by counting the number of CPU cycles since the last CPU reset. TSC counters are frequently used for benchmarking, where the current TSC value is read (via the rdtsc instruction) at the beginning and then again at the end of a test run. The two values can then be subtracted to yield the time the code took to run in CPU cycles. However there are problems when we try to use these counters long-term as a reliable source of absolute physical time, particularly when running in a VM:

There is no reliable way to discover the TSC frequency: without this we don’t know what to divide the counter values by to transform the result into seconds.
Some power management technology will change the TSC frequency dynamically.
The counter can jump back to 0 when the physical CPU is reset, for example over a host suspend / resume.
When a virtual CPU is stopped executing on one physical CPU core and later starts executing on another one, the TSC counter can suddenly jump forwards or backwards.

The unreliability of using TSC counters can be seen on this Docker Desktop for Mac install:

$ docker run --rm --privileged alpine /bin/dmesg | grep clocksource
…
[    3.486187] clocksource: Switched to clocksource tsc
[ 6963.789123] clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large:
[ 6963.792264] clocksource:                       'hpet' wd_now: 388d8fc2 wd_last: 377f3b7c mask: ffffffff
[ 6963.794806] clocksource:                       'tsc' cs_now: 104a0911ec5a cs_last: 10492ccc2aec mask: ffffffffffffffff
[ 6963.797812] clocksource: Switched to clocksource hpet

Many hypervisors fix these problems by providing an explicit “paravirtualised clock” interface, providing enough additional information to the VM to allow it to correctly convert a TSC value to seconds. Unfortunately the Hypervisor.framework on the Mac does not provide enough information (particularly over suspend/resume) to allow the implementation of such a paravirtualised timesource so we reported the issue to Apple and searched for a workaround.

How bad is this in practice?

I wrote a simple tool to measure the time drift between a VM and the host– the source is here. I created a small LinuxKit test VM without any kind of time synchronisation software installed and measured the “natural” clock drift after the VM boots:

Each line on the graph shows the clock drift for a different test run. From the graph it appears that the time in the VM loses roughly 2ms for every 3s of host time which passes. Once the total drift gets to about 1s (after approximately 1500s or 25 minutes) it will start to get really annoying.

OK, can we turn on NTP and forget about it?

The Network Time Protocol (NTP) is designed to keep clocks in sync so it should be ideal. The question then becomes

which client?
which server?

How about using the “default” pool.ntp.org like everyone else?

Many machines and devices uses the free pool.ntp.org as their NTP server. This is a bad idea for us for several reasons:

it’s against their [guidelines](http://www.pool.ntp.org/en/vendors.html) (although we could register as a vendor)
there’s no guarantee clocks in the NTP server pool are themselves well-synchronised
people don’t like their Mac sending unexpected UDP traffic; they fear it’s malware infestation
anyway… we don’t want the VM to synchronise with atomic clocks in some random physics lab, we want it to synchronise with the host (so the timestamps work). If the host itself has drifted 30 minutes away from “real” time, we want the VM to also be 30 minutes away from “real” time.

Therefore in Docker Desktop we should run our own NTP server on the host, serving the host’s clock.

Which server implementation should we use?

The NTP protocol is designed to be robust and globally scalable. Servers with accurate clock hardware (e.g. an atomic clock or a GPS feed containing a signal from an atomic clock) are relatively rare so not all other hosts can connect directly to them. NTP servers are arranged in a hierarchy where lower “strata” synchronise with the stratum directly above and end-users and devices synchronise with the servers at the bottom. Since our use-case only involves one server and one client this is all completely unnecessary and so we use “Simplified NTP” as described in RFC2030 which enables clients (in our case the VM) to synchronise immediately with a server (in our case the host).

Which NTP client should we use (and does it even matter)?

Early versions of Docker Desktop included openntpd from the upstream LinuxKit package. The following graph shows the time drift on one VM boot where openntpd runs for the first 10000s and then we switch to the busybox NTP client:

The diagram shows the clock still drifting significantly with openntpd running but it’s “fixed” by running busybox — why is this? To understand this it’s important to first understand how an NTP client adjusts the Linux kernel clock:

adjtime (3) – this accepts a delta (e.g. -10s) and tells the kernel to gradually adjust the system clock avoiding suddenly moving the clock forward (or backward, which can cause problems with timing loops which aren’t using monotonic clocks)
adjtimex (2) – this allows the kernel clock *rate* itself to be adjusted, to cope with systematic drift like we are suffering from
settimeofday (2) – this immediately bumps the clock to the given time

If we look at line 433 of openntpd ntp.c (sorry no direct link in cvsweb) then we can see that openntpd is using adjtime to periodically add a delta to the clock, to try to correct the drift. This could also be seen in the openntpd logs. So why wasn’t this effective?

The following graph shows how the natural clock drift is affected by a call to adjtime(+10s) and adjtime(-10s):

It seems the “natural” drift we’re experiencing is so large it can’t be compensated for solely by using “adjtime”. The reason busybox performs better for us is because it adjusts the clock rate itself using “adjtimex”.

The following graph shows the change in kernel clock frequency (timex.freq) viewed using adjtimex. For the first 10000s we use openntpd (and hence the adjustment is 0 since it doesn’t use the API) and for the rest of the graph we used busybox:

Note how the adjustment value remains flat and very negative after an initial spike upwards. I have to admit when I first saw this graph I was disappointed — I was hoping to see something zig-zag up and down, as the clock rate was constantly micro-managed to remain stable.

Is there anything special about the final value of the kernel frequency offset?

Unfortunately it is special. From the adjtimex (2) manpage:

ADJ_FREQUENCY
Set frequency offset from buf.freq. Since Linux 2.6.26, the
supplied value is clamped to the range (-32768000, +32768000).

So it looks like busybox slowed the clock by the maximum amount (-32768000) to correct the systematic drift. According to the adjtimex(8) manpage a value of 65536 corresponds to 1ppm, so 32768000 corresponds to 500ppm. Recall that the original estimate of the systematic drift was 2ms every 3s, which is about 666ppm. This isn’t good: this means that we’re right at the limit of what adjtimex can do to compensate for it and are probably also relying on adjtime to provide additional adjustments. Unfortunately all our tests have been on one single machine and it’s easy to imagine a different system (perhaps with different powersaving behaviour) where even adjtimex + adjtime would be unable to cope with the drift.

So what should we do?

The main reason why NTP clients use APIs like adjtime and adjtimex is because they want

monotonicity: i.e. to ensure time never goes backwards because this can cause bugs in programs which aren’t using monotonic clocks for timing, for example How and why the leap second affected Cloudflare DNS; and
smoothness: i.e. no sudden jumping forwards, triggering lots of timing loops, cron jobs etc at once.

Docker Desktop is used by developers to build and test their code on their laptops and desktops. Developers routinely edit files on their host with an IDE and then build them in a container using docker run -v. This requires the clock in the VM to be synchronised with the clock in the host, otherwise tools like make will fail to rebuild changed source files correctly.

Option 1: adjust the kernel “tick”

According to adjtimex(8) it’s possible to adjust the kernel “tick”:

Set the number of microseconds that should be added to the system time for each kernel tick interrupt. For a kernel with USER_HZ=100, there are supposed to be 100 ticks per second, so val should be close to 10000. Increasing val by 1 speeds up the system clock by about 100 ppm,

If we knew (or could measure) the systematic drift we could make a coarse-grained adjustment with the “tick” and then let busybox NTP to manage the remaining drift.

Option 2: regularly bump the clock forward with settimeofday (2)

If we assume that the clock in the VM is always running slower than the real physical clock (because it is virtualised, vCPUs are periodically descheduled etc) and if we don’t care about smoothness, we could use an NTP client which calls settimeofday (2) periodically to immediately resync the clock.

The choice

Although option 1 could potentially provide the best results, we decided to keep it simple and go with option 2: regularly bump the clock forward with settimeofday (2) rather than attempt to measure and adjust the kernel tick. We assume that the VM clock always runs slower than the host clock but we don’t have to measure exactly how much it runs slower, or assume that the slowness remains constant over time, or across different hardware setups. The solution is very simple and easy to understand. The VM clock should stay in close sync with the host and it should still be monotonic but it will not be very smooth.

We use an NTP client called SNTPC written by Natanael Copa, founder of Alpine Linux (quite a coincidence considering we use Alpine extensively in Docker Desktop). SNTPC can be configured to call settimeofday every n seconds with the following results:

As you can see in the graph, every 30s the VM clock has fallen behind by 20ms and is then bumped forward. Note that since the VM clock is always running slower than the host, the VM clock always jumps forwards but never backwards, maintaining monotonicity.

Just a sec, couldn’t we just run hwclock -s every 30s?

Rather than running a simple NTP client and server communicating over UDP every 30s we could instead run hwclock -s to synchronise with the hardware RTC every 30s. Reading the RTC is inefficient because the memory reads trap to the hypervisor and block the vCPU, unlike UDP traffic which is efficiently queued in shared memory; however the code would be simple and an expensive operation once every 30s isn’t too bad. How well would it actually keep the clock in sync?

Unfortunately running hwclock -s in a HyperKit Linux VM only manages to keep the clock in sync within about 1s, which would be quite noticeable when editing code and immediately recompiling it. So we’ll stick with NTP.

Final design

The final design looks like this:

In the VM the sntpc process sends UDP on port 123 (the NTP port) to the virtual NTP server running on the gateway, managed by the vpnkit process on the host. The NTP traffic is forwarded to a custom SNTP server running on localhost which executes gettimeofday and replies. The sntpc process receives the reply, calculates the local time by subtracting an estimate of the round-trip-time and calls settimeofday to bump the clock to the correct value.

In Summary

Timekeeping in computers is hard (especially so in virtual computers)
There’s a serious source of systematic drift in the macOS Hypervisor.framework, HyperKit, Linux system.
Minimising time drift is important for developer use-cases where file timestamps are used in build tools like make
For developer use-cases we don’t mind if the clock moves abruptly forwards
We’ve settled on a simple design using a standard protocol (SNTP) and a pre-existing client (sntpc)
The new design is very simple and should be robust: even if the clock drift rate is faster in future (e.g. because of a bug in the Hypervisor.framework or HyperKit) the clock will still be kept in sync.
The new code has shipped on both the stable and the edge channels (as of 18.05)

So, when do you use a Container or VM?

Mike Coleman — Fri, 13 May 2016 15:00:00 +0000

Recently I was giving a talk at a trade show on the basics of Docker, and how an application goes from an idea to being a production workload running on a Universal Control Plane managed Swarm cluster. As part of that talk, I spent a bit of time talking about how containers are not VMs. This was especially relevant as most of the attendees currently worked as virtualization sysadmins.

During the QA portion of the session one of the attendees asked me “When does my application go in a VM and when does it go in a container?”

Of course I started my answer the way I start my answer to these sorts of questions (those being questions to which there isn’t a pat answer). Questions like “How many containers can I run on my host?” or “Should I deploy to physical or virtual?”

I replied: “It depends” (insert audible groan from the audience and from you too I imagine)

Just like I wrote about in my last blog post regarding running Docker containers on physical vs. virtual servers, there are just too many variables.

However, like I did in that earlier post, I’m going to give you three different scenarios that you might want to consider when looking at where to deploy your application.

Before we go much deeper, let me reiterate that Docker isn’t the same as traditional virtualization. While there are similarities, at its core Docker is about delivering applications quickly and with the highest level of flexibility. Furthermore, in an ideal deployment container-based applications are delivered as a series of stateless microservices vs. the more traditional monolithic model found with virtual machines.

With that context here are three scenarios to consider when deciding where to deploy your application.

1) If you’re starting from scratch on a new application (or rewriting an existing app from the ground up), and you’re committed to writing it around a microservices-based architecture then containers are a no brainer.

Companies will leave their existing monolithic apps in place, while they develop the next version using Docker containers and microservices

By leveraging Docker, companies can accelerate application development and delivery efforts, while creating code that can be run across almost any infrastructure without modification.

2) You are committed to developing software based on microservices, but rather than wait until an app is completely rewritten, you want to begin gaining benefits of Docker immediately. In this scenario, customers will “lift and shift” an existing application from a VM into a Docker container.

With the monolithic application running in a container, the development teams can start breaking it down piece by piece. They can move some functions out of the monolith, and begin deploying them as loosely coupled services in Docker containers.

The new containers can interact with older, legacy applications as necessary, and over time the entire application is deconstructed, and deployed as a series of portable and scalable services inside Docker containers.

3) There are cases much like the second case, where companies want some benefits that Docker offers, and they move monolithic apps from VMs to containers with no intention of ever rewriting them.

Typically these customers are interested in the portability aspect that Docker containers offer out of the box. Imagine if your CIO came to you and said “Those 1,000 VMs we got running in the data center, I want those workloads up in the cloud by the end of next week.” That’s a daunting task even for the most hardcore VM ninja. There just isn’t good portability from the data center to the cloud, especially if you want to change vendors. Imagine you have vSphere in the datacenter and the cloud is Azure — VM converters be what they may.

However, with Docker containers, this becomes a pretty pedestrian effort. Docker containers are inherently portable and can run in a VM or in the cloud unmodified, the containers are portable from VM to VM to bare metal without a lot of heavy lifting to facilitate the transition.

If any of these scenarios resonate with you, then you’ve probably got a good case trying Docker.

So, when do you use a #Container or #VM? @mikegcoleman helps answer
Click To Tweet

More resources to get you started:

Sign up for a free 30 day trial of Docker Datacenter
Watch a Docker demo

Learn More about Docker

New to Docker? Try our 10 min online tutorial
Share images, automate builds, and more with a free Docker Hub account
Read the Docker 1.11 Release Notes
Subscribe to Docker Weekly
Sign up for upcoming Docker Online Meetups
Attend upcoming Docker Meetups
Watch DockerCon EU 2015 videos
Start contributing to Docker

To Use Physical Or To Use Virtual: That is the container deployment question

Mike Coleman — Fri, 29 Apr 2016 18:00:00 +0000

I have had a version of the following conversation more than a few times with community members trying to sort out where to run their containerized apps in production:

User: So, where should I run my containers? Bare metal or VM’s

Me: It’s not a question of “either / or” – that’s the beauty of Docker. That choice is based solely on what’s right for your application and business goals – physical or virtual, cloud or on premise. Mix and match as your application and business needs dictate (and change).

User: But, surely you have a recommendation.

Me: I’m going to give you the two word answer that nobody likes: It depends.

User: You’re right, I don’t like that answer.

Me: I kind of figured you wouldn’t, but it really is the right answer.

There are tough questions in the world of tech, and the answer “It depends” can often be a cop out. But in the case of where to run your containerized applications it really is the best answer because no two applications are exactly the same, and no two companies have exactly the same business needs.

Any IT decision is based on a myriad of variables: Performance, scalability, reliability, security, existing systems, current skillsets, and cost (to name just a few). When someone sets out to decide how to deploy a Docker-based application in production all of these things need to be considered.

Docker delivers on the promise of allowing you to deploy your applications seamlessly regardless of the underlying infrastructure. Bare metal or VM. Datacenter or public cloud. Heck, deploy your app on bare metal in your data center and on VMs across multiple cloud providers if that’s what is needed by your application or business.

The key here is that you’re not locked into any one option. You can easily move your app from one infrastructure to another. There is essentially zero friction.

But that freedom also makes the process of deciding where to run those apps seem more difficult than it really is. The answer is going to be influenced what you’re doing today, and what you might need to do in the future.

So while I can’t answer “Where should I run my app” outright, I can provide a list of things to consider when it comes time to make that decision.

I’m sure this list is far from complete, but hopefully it’s enough to start a conversation and get the gears turning

Latency: Applications with a low tolerance for latency are going to do better on physical. This something we see quite a bit in financial services (trading applications are prime example).

Capacity: VMs made their bones by optimizing system load. If your containerized app doesn’t consume all the capacity on a physical box, virtualization still offers a benefit here.

Mixed Workloads: Physical servers will run a single instance of an operating system. So, you if you wish to mix Windows and Linux containers on the same host, you’ll need to use virtualization

Disaster Recovery: Again, like capacity optimizations, one of the great benefits of VMs are advanced capabilities around site recovery and high availability. While these capabilities may exist with physical hosts, the are a wider array of options with virtualization.

Existing Investments and Automation Frameworks : A lot of the organizations have already built a comprehensive set of tools around things like infrastructure provisioning. Leveraging this existing investment and expertise makes a lot of sense when introducing new elements.

Multitenancy: Some customers have workloads that can’t share kernels. In this case VMs provide an extra layer of isolation compared to running containers on bare metal.

Resource Pools / Quotas: Many virtualization solutions have a broad feature set to control how virtual machines use resources. Docker provides the concept of resource constraints, but for bare metal you’re kind of on your own.

Automation/APIs: Very few people in an organization typically have the ability to provision bare metal from an API. If the goal is automation you’ll want an API, and that will likely rule out bare metal.

Licensing Costs: Running directly on bare metal can reduce costs as you won’t need to purchase hypervisor licenses. And, of course, you may not even need to pay anything for the OS that hosts your containers.

There is something really powerful about being able to make a decision on where to run your app solely based on the technical merits of the platform AND being able to easily adjust that decision if new information comes to light.

In the end the question shouldn’t be “bare metal OR virtual” – the question is which infrastructure makes the most sense for my application needs and business goals. So mix and match to create the right answer today, and know with Docker you can quickly and easily respond to any changes in the future.

To use physical or to use virtual? That is the #container deployment question by @mikegcoleman
Click To Tweet

Check out these resources to start learning more about Docker and containers:

Learn More about Docker

New to Docker? Try our 10 min online tutorial
Share images, automate builds, and more with a free Docker Hub account
Read the Docker 1.11 Release Notes
Subscribe to Docker Weekly
Sign up for upcoming Docker Online Meetups
Attend upcoming Docker Meetups
Watch DockerCon EU 2015 videos
Start contributing to Docker

There’s Application Virtualization and There’s Docker

Mike Coleman — Fri, 15 Apr 2016 15:00:00 +0000

In what appears to be a recurring theme (which I promise I’ll move off of soon), I’m going to spend some time talking about what Docker isn’t – Docker is not application virtualization.

I spent a good amount of time at VMware where I worked on VMware View (which begat Horizon View which begat Horizon 7), so I’m more than a bit familiar with desktop and application virtualization. And, I understand why some people, when they first hear us talk about leveraging Docker for application portability they think along the lines of App-V, XenApp or ThinApp.

Yesterday we talked about how the modern software supply chain runs on Docker, and in that post we noted that 41% of Docker users are targeting application portability as a core use case.

Before joining Docker if I heard “application portability” I would have immediately thought of ThinApp (based largely on my VMware heritage). That phrase to many who work in traditional server-based and desktop computing means the ability to deliver applications seamlessly without encountering common pitfalls such as “DLL hell”. It could mean using something like App-V or ThinApp to put the application inside of a sandbox that includes the app and all it’s necessary DLL’s. Or, it could mean hosting the application on a server, and serving it up remotely a la Citrix XanApp or Microsoft’s RemoteApp. Common examples of application virtualization targets include IE6 with custom extensions, Microsoft Word, Excel, etc..

So there is some common conceptual ground here between these application virtualization solutions and Docker. But there are also some critical differences.

As I wrote previously, Docker is not a virtualization technology in the historical sense of the word so much as it’s an application delivery platform. Docker enables traditional monolithic applications to be delivered as a set of reusable microservices.

All of the tools I mentioned in this post are really aimed at delivering legacy Windows desktop applications. These applications are monolithic in that they contain their own GUI (vs. a web app that is accessed via a browser). By contrast,the most widely used Docker workload are multi/micro service web apps.

So, yes Docker containers does encapsulate all the code and libraries necessary to run a service. But those services are fundamentally different than the applications that are delivered via traditional application virtualization technologies.

In the end Docker is not a direct replacement for application virtualization. It’s a way to take many of the applications deployed using app virt technologies and recreate them in a manner that offers higher levels of agility, portability, and control.

There’s Application #Virtualization and There’s @Docker – @mikegcoleman explains the differences
Click To Tweet

Learn More about Docker

New to Docker? Try our 10 min online tutorial
Share images, automate builds, and more with a free Docker Hub account
Try Docker today: Deploy one free node on Docker Cloud or trial Docker Datacenter free for 30 days.
Read the Docker 1.11 Release Notes
Subscribe to Docker Weekly
Sign up for upcoming Docker Online Meetups
Attend upcoming Docker Meetups
Watch DockerCon EU 2015 videos
Start contributing to Docker

Containers and VMs Together

Mike Coleman — Fri, 08 Apr 2016 15:00:00 +0000

A couple weeks back I talked about how Docker containers were not virtual machines (VMs). I received a lot of positive feedback on the article (thanks!), but I also heard a common question: Can VMs and Docker containers coexist?

The answer is a resounding “yes.”

At the most basic level VMs are a great place for Docker hosts to run. And by VMs I mean VMs in all their forms. Whether it’s a vSphere VM or a Hyper-V VM or an AWS EC2 instance, all of them will serve equally well as a Docker host. Depending on what you need to do, a VM might be the best place to land those containers. But the great thing about Docker is that, it doesn’t matter where you run containers – and it’s totally up to you.

Another question I hear relates to whether or not Docker container-based services can interact with VM-based services. Again, the answer is absolutely yes. Running your application in a set of Docker containers doesn’t preclude it from talking to the services running in a VM.

For instance, your application may need to interact with a database that resides in a virtual machine. Provided that the right networking is in place, your app can interact with that database seamlessly.

Another area where there can be synergy between VMs and Docker containers is in the area of capacity optimization. VMs gained early popularity because the enabled higher levels of server utilization. That’s still true today. A vSphere host, for instance, can host VMs that may house Docker hosts, but may also host any number of traditional monolithic VMs. By mixing and matching Docker hosts with “traditional” VMs, sysadmins can be assured they are getting the maximum utilization out of their physical hardware.

Docker embraces running Docker hosts on a wide variety of virtualization and cloud platforms. Docker Cloud and Docker Datacenter can easily manage Docker hosts regardless of where they run. And with Docker Machine you can provision new Docker hosts onto a wide variety of platforms including VMware vSphere, Microsoft Hyper-V, Azure, and AWS.

One of the most powerful things about Docker is the flexibility it affords IT organizations. The decision of where to run your applications can be based 100% on what’s right for your business. You’re not locked in to any single infrastructure, you can pick and choose and mix and match in whatever manner makes sense for you organization. Docker hosts on vSphere? Great. Azure? Sure. Physical servers? Absolutely. With Docker containers you get a this great combination of agility, portability, and control.

Can VMs and @Docker #containers coexist? – @mikegcoleman says yes:
Click To Tweet

Learn More about Docker

New to Docker? Try our 10 min online tutorial
Share images, automate builds, and more with a free Docker Hub account
Read the Docker 1.10 Release Notes
Subscribe to Docker Weekly
Sign up for upcoming Docker Online Meetups
Attend upcoming Docker Meetups
Watch DockerCon EU 2015 videos
Start contributing to Docker

Containers are not VMs

Mike Coleman — Thu, 24 Mar 2016 15:00:00 +0000

I spend a good portion of my time at Docker talking to community members with varying degrees of familiarity with Docker and I sense a common theme: people’s natural response when first working with Docker is to try and frame it in terms of virtual machines. I can’t count the number of times I have heard Docker containers described as “lightweight VMs”.

I get it because I did the exact same thing when I first started working with Docker. It’s easy to connect those dots as both technologies share some characteristics. Both are designed to provide an isolated environment in which to run an application. Additionally, in both cases that environment is represented as a binary artifact that can be moved between hosts. There may be other similarities, but to me these are the two biggies.

The key is that the underlying architecture is fundamentally different between the two. The analogy I use (because if you know me, you know I love analogies) is comparing houses (VMs) to apartment buildings (containers).

Houses (the VMs) are fully self-contained and offer protection from unwanted guests. They also each possess their own infrastructure – plumbing, heating, electrical, etc. Furthermore, in the vast majority of cases houses are all going to have at a minimum a bedroom, living area, bathroom, and kitchen. I’ve yet to ever find a “studio house” – even if I buy the smallest house I may end up buying more than I need because that’s just how houses are built. (for the pedantic out there, yes I’m ignoring the new trend in micro houses because they break my analogy)

Apartments (the containers) also offer protection from unwanted guests, but they are built around shared infrastructure. The apartment building (Docker Host) shares plumbing, heating, electrical, etc. Additionally apartments are offered in all kinds of different sizes – studio to multi-bedroom penthouse. You’re only renting exactly what you need. Finally, just like houses, apartments have front doors that lock to keep out unwanted guests.

With containers, you share the underlying resources of the Docker host and you build an image that is exactly what you need to run your application. You start with the basics and you add what you need. VMs are built in the opposite direction. You are going to start with a full operating system and, depending on your application, might be strip out the things you don’t want.

I’m sure many of you are saying “yeah, we get that. They’re different”. But even as we say this, we still try and adapt our current thoughts and processes around VMs and apply them to containers.

“How do I backup a container?”
“What’s my patch management strategy for my running containers?”
“Where does the application server run?”

To me the light bulb moment came when I realized that Docker is not a virtualization technology, it’s an application delivery technology. In a VM-centered world, the unit of abstraction is a monolithic VM that stores not only application code, but often its stateful data. A VM takes everything that used to sit on a physical server and just packs it into a single binary so it can be moved around. But it is still the same thing. With containers the abstraction is the application; or more accurately a service that helps to make up the application.

With containers, typically many services (each represented as a single container) comprise an application. Applications are now able to be deconstructed into much smaller components which fundamentally changes the way they are managed in production.

So, how do you backup your container, you don’t. Your data doesn’t live in the container, it lives in a named volume that is shared between 1-N containers that you define. You backup the data volume, and forget about the container. Optimally your containers are completely stateless and immutable.

Certainly patches will still be part of your world, but they aren’t applied to running containers. In reality if you patched a running container, and then spun up new ones based on an unpatched image, you’re gonna have a bad time. Ideally you would update your Docker image, stop your running containers, and fire up new ones. Because a container can be spun up in a fraction off a second, it’s just much cheaper to go this route.

Your application server translates into a service run inside of a container. Certainly there may be cases where your microservices-based application will need to connect to a non-containerized service, but for the most part standalone servers where you execute your code give way to one or more containers that provide the same functionality with much less overhead (and offer up much better horizontal scaling).

“But, VMs have traditionally been about lift and shift. What do I do with my existing apps?”

I often have people ask me how to run huge monolithic apps in a container. There are many valid strategies for migrating to a microservices architecture that start with moving an existing monolithic application from a VM into a container but that should be thought of as the first step on a journey, not an end goal.

As you consider how your organization can leverage Docker, try and move away from a VM-focused mindset and realize that Docker is way more than just “a lightweight VM.” It’s an application-centric way to deliver high-performing, scalable applications on the infrastructure of your choosing.

Check out these resources to start learning more about Docker and containers:

#Containers are not VMs – @mikegcoleman explains why:
Click To Tweet

Learn More about Docker

New to Docker? Try our 10 min online tutorial
Share images, automate builds, and more with a free Docker Hub account
Read the Docker 1.10 Release Notes
Subscribe to Docker Weekly
Sign up for upcoming Docker Online Meetups
Attend upcoming Docker Meetups
Watch DockerCon EU 2015 videos
Start contributing to Docker

Virtualization – Docker

Addressing Time Drift in Docker Desktop for Mac

So, when do you use a Container or VM?

Read more in this series:

More resources to get you started:

Learn More about Docker

To Use Physical Or To Use Virtual: That is the container deployment question

Read more by Mike Coleman in this series:

Check out these resources to start learning more about Docker and containers:

Learn More about Docker

There’s Application Virtualization and There’s Docker

Learn More about Docker

Containers and VMs Together

Learn More about Docker

Containers are not VMs

Learn More about Docker