7 Things you Need to know before Diving into Docker

281385801_2603cd6ad9_z

Are you a skeptic of new developer tools?

I know I am.

I roll my eyes when the next JavaScript framework streams across Hacker News.  Too many tools have failed to live up to the fan-boy hype.

They turn out to be marginally improved ways of doing the same old garbage.  But when you add a fresh coat of paint to garbage, the smell isn’t masked for long.

Slapping on a cool new name won’t change the underlying concept.

But, I’m here to show you that Docker is different.
It signifies a paradigm shift in virtualization because it’s a brand new concept.

It allows us to use virtual machines in ways we never have before.

It’s not just another fad.  In this post, I’ll walk you through the most relevant benefits and gotchas of Docker.  Along the way I’ll answer some important questions:

1.  How can Docker help me?

2.  What are the gotchas of Docker?

3.  Is Docker just another passing trend?

I realize web development has been crazy recently, and even Docker already has a clone.

But whether it lives or dies is irrelevant.

Containers are here to stay.

And if Docker doesn’t make it, some other platform will.

The concepts will be the same.  So learn them now.


Not another new technology.

I rolled my eyes when I first heard of Docker.  I thought, “Not another new technology.”

You would probably call me jaded.

But I adore technology.

I work with it everyday. I build stuff.  Nothing excites me more than reading about the next artificial brain, Graphene, and the advancements in Fusion.

Because a fusion reactor is Technology.

In contrast, new web tools typically fall somewhere between a 2 and a 3 on the 10 point innovation scale.

They want us to forget the tools we already know in exchange for something Better.

But do these tools actually make us Better?

Or did we waste time learning for no reason?

We accomplish the same task, in the same amount of time, in a minimally improved manner.  But we had to expend our time and energy.  And we have no idea if the tool will exist a year from now.

So how do we decide where to spend our valuable time?

The Rise and Fall of Technology in Web Development

Software development suffers fads.  Web Development is worse.  Let’s examine Node.js.

I’ve used Node a lot.  And I like it.

I love the package manager.  I love the testing framework.  But I don’t understand how Node experienced such a meteoric rise.

Ryan Dahl gave the original Node.js presentation back in 2009. At the end of the video, people applaud and cheer.  They Cheer!

Cheering at a football game would be louder, but they’re still cheering about a new web server.

The rest is history.  Node.js is one of the most popular tools today.  Most of the internet still does not use it, but the tech press covers Node disproportionately.

In Boulder, I see Node.js meetups frequently. And 50% of startup job openings are looking for Node.js developers.

But does Node deserve its fame? It’s a web server for crying out loud!  We’ve been serving web pages since the 90’s!

Node might handle memory better in certain I/O heavy applications, but most developers will never capitalize on that advantage.

What’s more, developing for Node can be a pain in the ass.  If you’re coming from the blocking IO world, the asynchronous callback model is difficult to use.

Ever heard of a Promise? Ever had a promise eat your exceptions so you have no idea what’s going wrong?

What about Node’s future? In early December 2014, some of the founding developers forked Node to start Io.js. It doesn’t matter why they did it. But which do you choose now? Should you use Node or IO.js?

I don’t have an answer for you.

Node isn’t alone. The same up and down cycle is seen in many of the popular web frameworks.  Angular faces a backlash with many teams abandoning it.

Every week on Hacker News I see 2-3 new JavaScript frameworks. A sizable percentage want me to develop using their style, even though it’s incompatible with every other style.

It locks me into that framework when the future of the tool is uncertain. Angular was especially guilty of this.

And Google just announced that v2 is not backward compatible with v1.  Some say this is no big deal.

I’m not so sure.

Docker went through its own drama when CoreOS announced a competing container platform.

As a Developer what do you do?

You’re a smart developer. You take the time to learn new development frameworks, languages, and methodologies, even if its on your own time.

You care about mastering your craft more than anything.

But what the hell do you do in the current environment?  Technologies rising and falling everywhere. You don’t want to invest your time in some tool that duplicates what you already do.

You can’t possibly learn everything, and you wouldn’t want to. So start with a simple test:

Does it make your life easier?  Does it allow you to do new things?

Angular, Dart, WebWorkers, IndexedDB, and many others failed this test for me.

But Docker passed.

Even if it gets replaced by some new product in the future, the idea of containers is here to stay. And for good reason.

So I’m writing this post to get you over the skepticism I originally had.


1) Docker containers are virtual machines, sort of.

DockerVM

Docker containers share the operating system kernel.  Virtual machines don’t.  That’s all you need to know.

When people say containers are “lightweight virtual machines,” this is what they mean.

Containers don’t include the entire operating system like virtual machines do.

What this looks like

You have 2 docker containers, both built from the same linux image.  Let’s use busybox as an example. (BusyBox is one of the smallest linux distributions you can find. It’s used extensively in embedded systems.)

Let’s run both of them using the following commands:

This starts 2 separate instances of the busybox container, spawning an interactive shell in each. When you run docker ps in a third shell, you’ll see:

If you run touch hello_world in the first terminal, you’ll see:

Then if you look in the second terminal.

There is no hello_world.  Linux containers give the same isolation as a virtual machine. They don’t share files, unless you tell them to.

Because containers share a kernel, anything that runs Linux can run in one.  So no Windows and no Mac OS.

2) What’s the overhead of running a Docker container?

In the beginning I thought Docker was just another virtual machine.

That VM would go on an Amazon VM and create (virtual machines)².  Then I’d be one more layer removed from the hardware.  The horror!

While virtual machines carry significant overhead, Linux containers do not.  The performance hit is negligible.

Because all docker containers share a kernel, it’s like you’re running multiple applications on the same computer.  CPU performance is typically as good as running natively.

Startup time is almost always dramatically faster too.  Containers start and stop in milliseconds.  Compare that to the typical VM which takes 20-30 seconds to boot.

The only downside is I/O takes a small hit.  The network throughput of a container is not as good and the redirect latency is ~10 µs.

Docker’s network implementation and the Linux container performance will improve over time.

3) Layers are here to help.

Docker uses layers.  Think about a stack.

First you have your operating system: Ubuntu, CentOS, BusyBox.  Then comes the applications you depend on: Apache, Nginx, MySQL.  Then comes your application.

Each is a layer to Docker. And each layer only contains changes from the previous layer. With the exception of the base, layers are typically small.

What this does for you

  • No need to clone the entire virtual machine to backup configuration changes.  These can be saved in tiny layers.
  • Swap your base layer. Want to examine CentOS?  Debian?  ArchLinux?  Just change one line in your DockerFile.  Use this to experiment with different Linux distributions on your Mac!
  • Multiple containers use minimal disk space.  10, 1 gigabyte containers with the same base image might only use slightly more than 1 gigabyte.
  • Run a CentOS container on an Ubuntu host.  Anything with the Linux kernel will work!

Key Question: What happens if I run an Ubuntu based container on an Ubuntu host?

Docker containers run in relative isolation from the host.  All they share is the kernel.

You will duplicate a bunch of the Ubuntu system files within the Ubuntu container.

Last I checked, the Ubuntu base image was 255MB, so the duplication would be non-trivial.

Then again, disk space is cheap.  So who cares.

4) Docker helps you in Development.

Stop installing software on your dev machine!

  • Need mysql?  Just download the mysql docker image and run it.
  • Need apache and wordpress?  Download the wordpress image containing everything and you’re done.
  • Have a bunch of mysql test data that you need your entire development team to use?  Create a docker image with the data, and have everyone pull the image.

The ability to quickly start containers is one of the coolest aspects of Docker.

Before, you had to download a full operating system image, create a new VirtualMachine, install the operating system, and install your development stack.

Every time you had to switch virtual machines, it’d require a full boot.  Ouch!  Docker speeds this process dramatically.

We can now perfectly mimic our production environment during development.

Take the example of nginx.

Normally we don’t run it during development.  Or I don’t.  I write my application code in Python or Ruby, and then I use nginx as a reverse proxy.

When it’s time to move to production, I login to my prod machine.  I install nginx and manually edit the configuration files in /etc/nginx.

My prod machine is left with random configuration files sitting around.  When I change something, I break production temporarily because I couldn’t fully test the new configuration.

Contrast that with Docker:

You build up your nginx image on your dev machine.  Commit your configuration files to a new nginx image, link it with your application code, and you’ve fully mimicked your production environment locally.  Cool!

Let’s move on to the uglier aspects of Docker.

5) Gotcha: Docker will eat your data

If you kill a running container without committing it, all your data will be gone!

Start a mysql server:

Then connect to that server and create a database using the mysql client (Note: the IP here is taken from boot2docker ip. Yours will probably be different.)

Now let’s kill the container.

Finally, let’s restart everything and reconnect to mysql.

WTF!?  3 rows in set?  But we just created a database!

So what happened?

Unless you commit the container, Docker reverts the image to its previous state when you run it again.

If you want to keep data around, you must use data volumes.  In the WordPress Theme example below, I use them to persist my theme, the file uploads, and the mysql data.

6) Gotcha: Docker can be painful in Development

It seems absurd to create a post claiming Docker can be both awesome and a pain in the ass for development.  But it’s true, unless you use the right tools.

When developing, your docker command lines turn into novels:

Add 3-4 containers and linking, and you have a mess on your hands.  Then try accidentally restarting your computer.  If you’re lucky you will have saved everything in your shell history.

If not, good luck.

Yes, you could write scripts or do bash aliases or write a Go program to deal with all of this for you.

If that’s you, then you should probably stop reading and never come back to this blog.

Remember we want to make our lives easier.

This is where some handy tools come in.  Fig is one of my favorites.

Below is my fig.yml file for a wordpress theme.

The fig file

  • Saves file uploads and mysql data in volumes.
  • Live updates with changes to the theme.
  • Links everything together.

Let’s break it down

Our two data volumes! We use a small image as a base, and we export the volumes at /var/lib/mysql and /var/www/html/wp-content/uploads.

When another container uses these, it will contain files with the paths of the volumes.  The data will persist across system restarts, container deaths, you name it.

db: is our database image.  We start with mysql:latest, setup a few environment variables, and tell it to use the volume from the data container we created above.

wordpresstheme: is the theme we’re developing.  The key line is

We tell Docker to map the theme/ folder on the host into the thinkfaster/ directory in the container.

Why?

All of our theme files sit in theme/ on the host, and we need to be able to modify them. Sharing the theme/ folder allows us to live-update the files, reload the webpage, and see the results.

And finally

wordpress: is our main image.  Here we map the port 80 to 8080 on the host.

We “link” our mysql image to our wordpress image.  And we tell it to use several volumes that we already setup.

We can run the whole thing with

We have the logs from all the containers, and everything is started in the proper order.

Going to http://<boot2docker ip>:8080 will show us our WordPress site.  And theme updates will be propagated automatically without a server restart.

Pretty cool!

Honestly, if fig didn’t exist, I probably would not use Docker.  Starting and stopping the containers manually and remembering all the options is too much of a pain.

Scripting might be okay, but I hate bash.  And dealing with processes in Python nauseates me.

If I’m trying to throw something together quickly, fig is my most intuitive option.

7) Gotcha: Why I don’t run Docker in production.

Call me a hypocrite, but I’m not running this site using Docker.  It was dramatically easier to use the default WordPress image from DigitalOcean.

Even after you’ve linked everything correctly in development, you still need to write SystemD, SupervisorD, or whatever jobs to make sure your containers keep running and restart correctly.

CoreOS is one of the most popular choices as a host image for Docker in production.  If you decide to explore that as well, then your learning curve goes vertical.

If I were a member of a larger team, then yes, I would use it.

If I were running more applications on each machine, then yes, I would use it.

But I’m not.  So I didn’t bother.

In production, Docker containers take more time and effort to setup than other methods.

After you pay the startup cost, the benefits are obvious.

If you’re just doing something quick and dirty, then be cautious.


Thanks!  I hope you’ve found this impromptu guide useful.  Leave me your thoughts on Docker in the comments below.

Photo Credit: Jim Bahn