How do I even container?

2018 is widely tipped to be the year of the container, and if you haven’t heard of Docker and or containerisation you’ve probably been living under a rock.  This post is aimed at an audience who are deciding how and where to start evaluating a container adoption strategy.

What is a container?

Containers are a packaging format for wrapping up just an application and it’s runtime dependencies in a portable format suitable for deployment regardless of the underlying environment.  Containers have some advantages over full fat machines in terms of being lightweight, portable and typically have much quicker warm up times (you don’t have to boot a whole OS - the container entrypoint is usually your app, rather than booting a kernel, initialising hardware and running other services and finally exposing your code).

Where should we start?

Organisationally the first step is learning how to build containers to package your applications and tools.  You’ll need to build pipelines for this - automate the build, no hand cranking.

I’ve seen really good success here from operational teams at building tools that are then used for CI/CD or your own automation, rather than for revenue generating/customer facing systems - containerise your toolchain and test frameworks along with your set of plugins at the appropriate version, and use them from containers during your CI/build phase.  You can probably just run a local container engine for this (Docker - why not!).  An example win here is speeding your CI processes to scale horizontally by parallelising processing

Application development teams should in parallel ship their code package in a container format, and build testing in at the side.  I’d advise against a pattern of all pre-production activity being carried out in containers, only to deploy into a full fat VM at the end - that breaks the principle of testing on what you run.

Smaller companies and organisations within them will find it easier to have fewer competing solutions internally than larger ones.  Avoid reinventing the wheel and having different teams working out how to do the same stuff - implement some sort of self service catalog and make sure different potential silos in your organisation communicate and collaborate on solutions where possible.

Once you’ve got the hang of taking containers through a build pipeline, you ought to pretty easily be able to port your apps onto a scheduler (a higher level application and service controller to bring individual containers up and down, expose services to the network, orchestrate deployments and control resources).  Don’t try to run user facing/revenue generating services without one.  Inventing your own feels like an awful lot of wasted effort, so don’t do that either.  Having a scheduler will also open the door to having services that as far as possible self recover - it’ll do health checks for you, nuke broken containers and scale you up when you need extra capacity.

The main public cloud providers are starting to provide PaaS implementations, and when you’re starting out this will be the lowest set of barriers and costs to entry.  Unless the turnkey solutions do not fit your purpose, building and running your own is extra overhead - you’ll have a load of VMs to manage, as well as a load of containers.

Microservice all the things?

Depends on whether or not what you have is actually composable of microservices, or is it really just a monolith with a number of difference components?  You probably can split up an application into discrete services and there may be some development gains to be had by doing so, but forcing calls from one part of your app to another to be routed over a network introduces complexity and inevitable latency - some stuff is simply better done in process from a security and performance point of view.  I’ve seen implementations of really simple application functionality fail because of bad architecture decisions at the outset, so choose your implementation wisely.

Which scheduler?

Kubernetes seems to be heading towards being the ubiquitous implementation - it came out of Google originally, most people seem to be able to get it to work effectively (though it’s not without bugs), and seems to be the public cloud implementation of choice.  Kubernetes at the time of writing has PaaS implementations on Azure and GCE - you still need at the time of writing to run your own on top of AWS pending GA of EKS.  There is reasonable tooling to spin one up for yourself.

What other considerations do I need to worry about?

All the usual platform worries apply.

  • Security
  • External registries and external dependencies should be avoided - import everything you can
  • 3rd party containers must be audited, or BYOC (Build your own container - you heard it here first)
  • Implement vulnerability scanning, and container integrity tools
  • Unprivileged by default
  • Filter network traffic (firewalling)
  • Cost
  • Depends on how you implement the service and on which cloud, or on-prem
  • It won’t be free, but neither should it dramatically increase your run costs
  • Migration will cost you in development, again it’s not free
  • Use cases you should consider
  • Stateless apps
  • Shipping tools for e.g. CI, standard tools, desktop apps
  • Single process apps
  • Low latency scaling
  • Use cases to avoid
  • Apps with persistent storage requirements
  • Databases - there is general consensus on this
  • Apps that do lots of UDP
  • Container format choice
  • For the sake of compatibility, choose Dockerfile
  • Monitoring and performance
  • Metrics
  • Logging
  • You’ll need to adopt a standard method to ship and aggregate logs
  • Implement application frameworks
  • Log to Logging volumes
  • Log to Logging container
  • Healthchecks
  • Get a scheduler
  • You will need a scheduler for running and orchestrating apps
  • Probably Kubernetes on current velocity
  • You could mesosphere, swarm, nomad etc but start with the most commonly adopted tooling and reevaluate if it doesn’t fit

Can’t I just go serverless?

To a degree this is possibly true. You should certainly consider the use cases.  For transactional processing, event or schedule driven tasks and high volume asynchronous side channel jobs supporting a main application, something like AWS Lambda, or Azure Functions might well be for you.  Serverless does have limits though - there are language constraints, size constraints and you lose a bit of introspection about performance and debugging might be a bit harder.

Can I run a unikernel?

Honestly and seriously, someone actually asked this as I was writing, which was unexpected because I’d not even thought about unikernels for probably a couple of years.  A unikernel is a self contained compiled application with a high degree of specialisation but unlike a container, it runs directly on top of a hypervisor without an intervening operating system.  As with everything, there are advantages and disadvantages.  You probably get better security, but less operational tooling.  They might run faster, but are harder to develop for.

You could run a unikernel, but on what, where, who is going to build it for you and why would you do it to yourself.  There’s very little battle tested and hardened general purpose tooling to help you and unless you’re right up to your shoulders in niche requirements, containers will do all you need to do.  In summary, I’d suggest not doing it to yourself.

Written by Chris Spence - Principal DevOps Consultant