Celebrating the wonderful 2015 conference...

October 05, 2015

October Events for CoreOS

Is it possible to have too much of CoreOS CTO Brandon Philips (@brandonphilips)? We don’t think so! Brandon will be journeying throughout the US and Europe in October. See if you can meet him at one of these talks.

Meet us at AWS re:Invent, October 6-9, 2015 - Las Vegas, Nevada

We’re sponsoring AWS re:Invent. Come by booth 1449 to get all your CoreOS questions answered, find out about career opportunities, or just to say hello!

If you missed us at Container Summit last month, watch this talk with Brandon (@brandonphilips) about container ecosystem standards.

More Events in October

Monday, October 5, 2015 at 10:30 a.m. IST – Dublin, Ireland

We’re kicking off our October events at LinuxCon EU! Matthew Garrett (@mjg59), principal security developer at CoreOS, will be discussing what container security actually looks like.

Monday, October 5, 2015 at 11:30 a.m. IST – Dublin, Ireland

Alban Crequy is at LinuxCon EU presenting Container Mechanics in rkt and Linux. He'll describe the Linux API that make containers possible (namespaces and cgroups), then he'll explain how rkt containers use this Linux API.

Tuesday, October 6, 2015 at 9:20 a.m. IST – Dublin, Ireland

Brandon (@brandonphilips) will be speaking on the LinuxCon EU Container Panel alongside Tom Barlow from Docker, Joe “Zonker” Brockmeier from Red Hat and Sebastien Goasguen from Citrix.

Tuesday, October 6, 2015 at 6:00 p.m. IST – Dublin, Ireland

Head over to the Dublin DevOps meetup at Zendesk to hear Brandon (@brandonphilips) talk about Go and CoreOS and CoreOS and Kubernetes. Good times guaranteed!

Wednesday, October 7, 2015 at 2:00 p.m. IST – Dublin, Ireland

Our LinuxCon adventure comes to a close with Brandon (@brandonphilips) giving a talk on Modern Container Orchestration. Don’t miss it!

Wednesday, October 14, 2015 at 2:10 p.m. EDT – New York, New York

Brandon (@brandonphilips) takes on Velocity and he’ll teach you how to get Google-like infrastructure: from the OS to the scheduler.

Monday, October 19, 2015 at 9:15 a.m. EDT – Raleigh, North Carolina

Want more of Brandon (@brandonphilips)? You got it. He’ll be kicking things off with a keynote at All Things Open at 9:15 a.m. EDT. Later on, you can find him giving a deep dive on the application containers stack, rkt to Kubernetes at 2:30 p.m. EDT.

Wednesday, October 21, 2015 at 6:30 p.m. PDT – San Francisco, California

We end this Brandon (@brandonphilips) tour with a meetup at GoSF! Want to learn more about web-based auth in Go with OAUTH 2.0 and dex? You won’t want to miss this.

Tuesday, October 27, 2015 at 6:00 p.m. PDT – San Francisco, California

We’re planning a meetup with ClusterHQ! More details coming soon.

Interested in hosting your own meetup or want to learn more about getting involved with the CoreOS Community? Email us at

October 02, 2015

Official CloudFormation and kube-aws tool for installing Kubernetes on AWS

Official Installation of Kubernetes on CoreOS and AWS

As we head into AWS re:Invent next week, we are making it one step easier to use Kubernetes on AWS. We are releasing an official CloudFormation for launching Kubernetes on AWS, as well as kube-aws, a tool that assists in automating your cluster deployment and makes it easy to configure end-user tools like kubectl. This cluster setup is subject to regular conformance testing and is officially supported via Tectonic.

Some of the AWS specific setup includes:

  • ELB integration for Kubernetes Services allows for traffic ingress to selected microservices
  • Worker machines are deployed in an Auto Scaling group for effortless scaling
  • Full TLS is set up between Kubernetes components and users interacting with kubectl

Coming Soon:

  • Utilize VPC advanced networking for a more performant pod network
  • Mount EBS volumes into a pod for persistent storage

See official CoreOS documentation for full details.

Join us at AWS re:Invent, booth #1449, to see Kubernetes on AWS first hand and talk to us more about Kubernetes, CoreOS, rkt, flannel, fleet and more.

If you are interested in using Tectonic, we are currently in Tectonic Preview, and today are releasing an official Tectonic AWS Installer. The Preview is available free of cost until general availability.

September 29, 2015

Container Security with SELinux and CoreOS

At CoreOS, running containers securely is a number one priority. We recently landed a number of features that are helping make CoreOS Linux a trusted and even more secure place to run containers. As of the 808.0.0 release, CoreOS Linux is tightly integrated with SELinux to enforce fine-grained permissions for applications. Building on top of these permissions, our container runtime, rkt, has gained support for SVirt in addition to a default SELinux policy. The rkt SVirt implementation is compatible with Docker’s SVirt support, keeping you secure no matter what container runtime you choose.

Before covering these new features in detail, it’s important to step back and review how container technology is already keeping infrastructure secure.

Containers = Increased Security Through Isolation

Containers ease the deployment and management of applications and their dependencies, but the isolation that containers provide also results in increased security by reducing the degree to which applications can interact.

Containers place applications in restricted environments, isolated from each other using Linux functionality known as "namespaces." Each container runs in an independent namespace and is granted its own view of various operating system resources. This isolation prevents code within a container from interacting with code in other containers, resulting in an increase in security compared to running multiple non-containerized applications on the same system.

Unfortunately, the namespace support code is complicated and in the past, various bugs have allowed applications to escape from this namespace isolation and interfere with other containers. The known bugs have been fixed. In fact, technologies such as seccomp (a “secure computing” mechanism) reduce the number of system calls available to containerized applications and thus make it more difficult for exploitation of these bugs. However, further additional layers of security are recommended and can help mitigate any risk associated with future bugs.

SELinux Makes for Fine-Tuned Permissions

One important way to add a layer of security is with Security Enhanced Linux (SELinux). SELinux is a Linux kernel feature that allows for fine-grained restrictions to be applied to application permissions. Each process has an associated context, and, a set of rules defines the interactions permitted between contexts. This allows strict limits to be placed on what processes can do to each other and which resources they can access. A technology called SVirt, introduced by Red Hat, runs each container in a unique SELinux context. This context is permitted to access only the files and mount points required for that specific container – even if a process should manage to escape from the namespaces used to constrain it, the SELinux policy will still forbid it from accessing any other system resources or interacting with the contents of other containers.

CoreOS has introduced SVirt into the rkt container runtime and incorporated appropriate SELinux policy into the CoreOS Linux operating system. Out of the box, individual containers will run in independent SELinux contexts without the administrator having to take any further action. This implementation has also been designed to be compatible with the SVirt implementation in the Docker container runtime, and as such, running Docker under the CoreOS environment will benefit from additional isolation in the same way.

Let’s explore a quick example scenario. A bug in a service running inside a container allows an attacker to gain shell access to that container. The kernel namespaces that underlie containers will restrict that attacker from observing or interacting with any other containers running on the same system. However, if the attacker is able to take advantage of a kernel bug, they may be able to escape from the container environment. As multiple containers will typically be running as the same user, the attacker would then be able to attack any other containers hosted on the same system. In the SELinux scenario, the user will be restricted from doing this – every container runs in a different context and SELinux will forbid cross-context access, preventing any further damage from occurring. In fact, the SELinux policy is sufficiently strict that the attacker will be unable to launch any additional applications from the host environment.

Notes for Deploying

In order to avoid unexpected incompatibilities as this new feature is deployed, the default behaviour of the SELinux policy is to log policy violations but not to enforce them. See instructions on how to view these logs and enable policy enforcement.

This implementation is in its early days, and there are a couple of important limitations:

  • SELinux is currently unsupported when using Btrfs due to technical limitations in Btrfs. Upstream kernel work is continuing.
  • Support for shared volumes between containers when enforcing SELinux policy under rkt is currently incomplete but is being actively developed.

Next Steps for Security

CoreOS was founded on the principle of improving the security of the backend of the Internet. SELinux will continue to be an important security feature for CoreOS Linux projects. Other ways we are working to add security have been through virtualization, including using virtual machines to improve container security with the release of rkt v0.8.0. And, we are looking toward using TPM-based technologies to provide a fully attestable boot chain and cryptographically verifiable audit trails.

We welcome your participation in our projects and feedback in areas you find most important for your security needs. Join the discussion on the mailing lists for CoreOS Linux, rkt and more.

September 27, 2015

CAP Implementation Workshop 2015: Rendering Agents Setting the Course

This year had many Common Alerting Protocol (CAP) message aggregation organizations present their stories in complementing the delivery of early warnings. Everyone is mostly interested in doing public warning. Sahana Alerting and Messaging Broker (SAMBRO) was the only solution that [Read the Rest...]

Training of Trainer Workshop Final Day with ESCAP

The final day of the Training of Master Trainers Workshop was primarily interactions with the CAP on a Map project UNESCAP Program Officer: Mr. Alf Blikberg. The participants had the opportunity to present their cases to him. The outline of [Read the Rest...]

September 23, 2015

Homebrew Tap for Mutt 1.5.24 with trash_folder patch

At work I'm a quite avid user of Mutt. Unfortunately the upgrade to the recently released version 1.5.24 did not go over as smooth as expected.

I'm using Homebrew to install Mutt on Mac OS X, and even though there is an updated version in the official Homebrew repository, it no longer comes with the trash_folder patch (it fails to apply against the 1.5.24 source tree and was thus removed).

In order to build the new Mutt version with the trash_folder support, I updated the patch for version 1.5.24: mutt-1.5.24-trash_folder.diff.

The official Homebrew repository prefers unpatched packages and encourages the creation of independent "Taps" (package repositories) for patched packages. Thus I also created my own Homebrew Tap which contains the 1.5.24 version of Mutt with the updated trash_folder patch: x-way/homebrew-mutt.

To use this Tap just type brew tap x-way/mutt followed by brew install x-way/mutt/mutt --with-trash-patch to install Mutt 1.5.24 with trash_folder support. Cheers!

Government as an API: how to change the system

A couple of months ago I gave a short speech about Gov as an API at an AIIA event. Basically I believe that unless we make government data, content and transaction services API enabled and mashable, then we are simply improving upon the status quo. 1000 services designed to be much better are still 1000 services that could be integrated for users, automated at the backend, or otherwise transformed into part of a system rather than the unique siloed systems that we have today. I think the future is mashable government, and the private sector has already gone down this path so governments need to catch up!

When I rewatched it I felt it captured my thoughts around this topic really well, so below is the video and the transcript. Enjoy! Comments welcome.

The first thing is I want to talk about gov as an API. This is kind of like on steroids, but this goes way above and beyond data and gets into something far more profound. But just a step back, the to the concept of Government as a platform. Around the world a lot of Governments have adopted the idea of Government as a platform: let’s use common platforms, let’s use common standards, let’s try and be more efficient and effective. It’s generally been interpreted as creating platforms within Government that are common. But I think that we can do a lot better.

So Government as an API is about making Government one big conceptual API. Making the stuff that Government does discoverable programmatically, making the stuff that it does consumable programmatically, making Government the platform or a platform on which industry and citizens and indeed other Governments can actually innovate and value add. So there are many examples of this which I’ll get to but the concept here is getting towards the idea of mashable Government. Now I’m not here representing my employers or my current job or any of that kind of stuff. I’m just here speaking as a geek in Government doing some cool stuff. And obviously you’ve had the Digital Transformation Office mentioned today. There’s stuff coming about that but I’m working in there at the moment doing some cool stuff that I’m looking forward to telling you all about. So keep an eye out.

But I want you to consider the concept of mashable Government. So Australia is a country where we have a fairly egalitarian democratic view of the world. So in our minds and this is important to note, in our minds there is a role for Government. Now there’s obviously some differences around the edges about how big or small or how much I should do or shouldn’t do or whatever but the concept is that, that we’re not going to have Government going anywhere. Government will continue to deliver things, Government has a role of delivering things. The idea of mashable Government is making what the Government does more accessible, more mashable. As a citizen when you want to find something out you don’t care which jurisdiction it is, you don’t care which agency it is, you don’t care in some cases you know you don’t care who you’re talking to, you don’t care what number you have to call, you just want to get what you need. Part of the problem of course is what are all the services of Government? There is no single place right now. What are all of the, you know what’s all the content, you know with over a thousand websites or more but with lots and lots of websites just in the Federal Government and thousands more across the state and territories, where’s the right place to go? And you know sometimes people talk about you know what if we had improved SEO? Or what if we had improved themes or templates and such. If everyone has improved SEO you still have the same exact problem today, don’t you? You do a google search and then you still have lots of things to choose from and which one’s authoritative? Which one’s the most useful? Which one’s the most available?

The concept of Government as an API is making content, services, API’s, data, you know the stuff that Government produces either directly or indirectly more available to collate in a way that is user centric. That actually puts the user at the centre of the design but then also puts the understanding that other people, businesses or Governments will be able to provide value on top of what we do. So I want to imagine that all of that is available and that everything was API enabled. I want you to imagine third party re-use new applications, I mean we see small examples of that today. So to give you a couple of examples of where Governments already experimenting with this idea. obviously my little baby is one little example of this, it’s a microcosm. But whilst ever data, open data was just a list of things, a catalogue of stuff it was never going to be that high value.

So what we did when we re-launched a couple of years ago was we said what makes data valuable to people? Well programmatic access. Discovery is useful but if you can’t get access to it, it’s almost just annoying to be able to find it but not be able to access it. So how do we make it most useful? How do we make it most reusable, most high value in capacity shall we say? In potentia? So it was about programmatic access. It was about good meta data, it was about making it so it’s a value to citizens and industry but also to Government itself. If a Government agency needs to build a service, a citizen service to do something, rather than building an API to an internal system that’s privately available only to their application which would cost them money you know they could put the data in Whether it’s spatial or tabular and soon to be relational, you know different data types have different data provision needs so being able to centralise that function reduces the cost of providing it, making it easy for agencies to get the most out of their data, reduce the cost of delivering what they need to deliver on top of the data also creates an opportunity for external innovation. And I know that there’s already been loads of applications and analysis and uses of data that’s on and it’s only increasing everyday. Because we took open data from being a retrospective, freedom of information, compliance issue, which was never going to be sexy, right? We moved it towards how you can do things better. This is how we can enable innovation. This is how agencies can find each other’s data better and re-use it and not have to keep continually repeat the wheel. So we built a business proposition for that started to make it successful. So that’s been cool.

There’s been experimentation of gov as an API in the ATO. With the SBR API. With the ABN lookup or ABN lookup API. There’s so many businesses out there. I’m sure there’s a bunch in the room. When you build an application where someone puts in a business name into a app or into an application or a transaction or whatever. You can use the ABN lookup API to validate the business name. So you know it’s a really simple validation service, it means that you don’t have, as unfortunately we have right now in the whole of Government contracts data set 279 different spellings for the Department of Defence. You can start to actually get that, use what Government already has as validation services, as something to build upon. You know I really look forward to having whole of Government up to date spatial data that’s really available so people can build value on top of it. That’ll be very exciting. You know at some point I hope that happens but. Industry, experimented this with energy ratings data set. It’s a very quick example, they had to build an app as you know Ministers love to see. But they built a very, very useful app to actually compare when you’re in the store. You know your fridges and all the rest of it to see what’s best for you. But what they found, by putting the data on they saved money immediately and there’s a brilliant video if you go looking for this that the Department of Industry put together with Martin Hoffman that you should have a look at, which is very good. But what they found is by having the data out there, all the companies, all the retail companies that have to by law put the energy rating of every electrical device they sell on their brochures traditionally they did it by goggling, right? What’s the energy rating of this, whatever other retail companies using we’ll use that.

Completely out of date and unauthorised and not true, inaccurate. So by having the data set publically available kept up to date on a daily basis, suddenly they were able to massively reduce the cost of compliance for a piece of regulatory you know, so it actually reduced red tape. And then other application started being developed that were very useful and you know Government doesn’t have all the answers and no one pretends that. People love to pretend also that Government also has no answers. I think there’s a healthy balance in between. We’ve got a whole bunch of cool, innovators in Government doing cool stuff but we have to work in partnership and part of that includes using our stuff to enable cool innovation out there.

ABS obviously does a lot of work with API’s and that’s been wonderful to see. But also the National Health Services Directory. I don’t know who, how many people here know that? But you know it’s a directory of thousands, tens of thousands, of health services across Australia. All API enabled. Brilliant sort of work. So API enabled computing and systems and modular program design, agile program design is you know pretty typical for all of you. Because you’re in industry and you’re kind of used to that and you’re used to getting up to date with the latest thing that’ll make you competitive.

Moving Government towards that kind of approach will take a little longer but you know, but it has started. But if you take an API enabled approach to your systems design it is relatively easy to progress to taking an API approach to exposing that publically.

So, I think I only had ten minutes so imagine if all the public Government information services were carefully, were usefully right, usefully discoverable. Not just through using a google search, which appropriate metadata were and even consumable in some cases, you know what if you could actually consume some of those transaction systems or information or services and be able to then re-use it somewhere else. Because when someone is you know about to I don’t know, have a baby, they google for it first right and then they go to probably a baby, they don’t think to come to government in the first instance. So we need to make it easier for Government to go to them. When they go to, why wouldn’t be able to present to them the information that they need from Government as well. This is where we’re starting to sort of think when we start following the rabbit warren of gov as an API.

So, start thinking about what you would use. If all of these things were discoverable or if even some of them were discoverable and consumable, how would you use it? How would you innovate? How would you better serve your customers by leveraging Government as an API? So Government has and always will play a part. This is about making Government just another platform to help enable our wonderful egalitarian and democratic society. Thank you very much.

Postnote: adopting APIs as a strategy, not just a technical side effect is key here. Adopting modular architecture so that agencies can adopt the best of breed components for a system today, tomorrow and into the future, without lock in. I think just cobbling APIs on top of existing systems would miss the greater opportunity of taking a modular architecture design approach which creates more flexible, adaptable, affordable and resilient systems than the traditional single stack solution.

September 21, 2015

Cross-host Container Communication with rkt and flannel

The latest release of rkt, a container runtime, introduced many valuable features in v0.8. One notable feature is the ability to effortlessly run rkt with flannel, a software-defined network for containers. This makes it easy for all the containers in your cluster to have a unique IP over which they can converse with each other.

Setting up rkt with flannel

Let's walk through setting up rkt with flannel on CoreOS. We start with the CoreOS image 808.0 or later and bring up 3 instances clustered together using the following cloud-config:


    - name: etcd2.service
      command: start

    - name: flanneld.service
        - name: 50-network-config.conf
          content: |
            ExecStartPre=/usr/bin/etcdctl set / '{ "network": "" }'

      command: start

    discovery: $YOUR_DISCOVERY_TOKEN
    advertise-client-urls: http://$public_ipv4:2379
    initial-advertise-peer-urls: http://$private_ipv4:2380
    listen-peer-urls: http://$private_ipv4:2380

  - path: "/etc/rkt/net.d/10-containernet.conf"
    permissions: "0644"
    owner: "root"
    content: |
        "name": "containernet",
        "type": "flannel"

Once the instances have booted, we can confirm that flannel is up and running by checking for flannel0 interface.

If you look up at cloud-config, you will notice that write_files section has written out a 10-containernet.conf file. This describes a network that rkt containers will join. In our case the configuration is really simple — it gives the network a name and specifies that it will work with flannel. We will look into specifics of the "type" field shortly.

We are now ready to launch a container with rkt to test out the setup. We will be using an Alpine Linux Docker container with an entrypoint set to /bin/sh. Start the rkt container as follows:

$ sudo rkt run --private-net --interactive --insecure-skip-verify docker://
Downloading f4fddc471ec2: [====================================] 2.49 MB/2.49 MB
Downloading 577f81886e20: [====================================] 32 B/32 B
2015/09/16 19:17:06 Preparing stage1
2015/09/16 19:17:07 Loading image sha512-14f9c6504e687e4b902461437ddb3d4c3d84c039bf9111d5d165a52e380942b7
2015/09/16 19:17:07 Writing pod manifest
2015/09/16 19:17:07 Setting up stage1
2015/09/16 19:17:07 Writing image manifest
2015/09/16 19:17:07 Wrote filesystem to /var/lib/rkt/pods/run/6a35d365-565b-4c61-898e-2e2929c2ff38
2015/09/16 19:17:07 Writing image manifest
2015/09/16 19:17:07 Pivoting to filesystem /var/lib/rkt/pods/run/6a35d365-565b-4c61-898e-2e2929c2ff38
2015/09/16 19:17:07 Execing /init
/ #

The --private-net option instructs rkt to allocate a separate networking stack for the container and have it join the networks configured in /etc/rkt/net.d. To confirm the container has joined the "containernet", look at its interfaces:

/ # ifconfig
eth0      Link encap:Ethernet  HWaddr 96:97:A2:15:4F:A7  
          inet addr:  Bcast:  Mask:
          inet6 addr: fe80::9497:a2ff:fe15:4fa7/64 Scope:Link
          RX packets:18 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5 errors:0 dropped:1 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2184 (2.1 KiB)  TX bytes:418 (418.0 B)

eth1      Link encap:Ethernet  HWaddr 3A:87:1C:29:9A:57  
          inet addr:  Bcast:  Mask:
          inet6 addr: fe80::3887:1cff:fe29:9a57/64 Scope:Link
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5 errors:0 dropped:1 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:508 (508.0 B)  TX bytes:418 (418.0 B)

lo        Link encap:Local Loopback  
          inet addr:  Mask:
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

The IP of eth0 is within the flannel network that we defined ( Note that eth1 is the so-called "default" network that is automatically added by rkt. It is there to allow the container to communicate with the host and the Internet.

Bring up the same Alpine container on the other two instances and note their eth0 IPs. The containers should now be able to ping each other by their flannel (eth0) IPs.

Warning: On CoreOS, the above setup cannot be mixed with Docker due to only a single flannel subnet being allocated to the host. Starting docker.service unit will cause docker0 bridge to be assigned the same flannel subnet and will lead to conflicts. If you need to run flannel with Docker and rkt side-by-side, we will be adding support for that in the future.

Looking behind the curtain

flannel uses CNI to power its networking plugins. The "type" field in the network conf file refers to the CNI plugin. With that in mind, let's look at what CNI's flannel plugin goes through to attach the container to the "containernet" network.

A CNI plugin is a simple executable file that runs when the container comes up and runs again when it is torn down during the garbage collection cycle. As we'll see, the flannel plugin itself does surprisingly little — it is actually a wrapper around two lower-level plugins. When executed to add a container to the network, it combines the information from /etc/rkt/net.d/10-containernet.conf and /run/flannel/subnet.env to generate a configuration for the plugins to which it will delegate the work.

/run/flannel/subnet.env is written out by flannel on start up and contains information such as the subnet that it was assigned:


CNI's flannel plugin uses this data to synthesize the following configuration for the "bridge" and "host-local" plugins:

   "name" : "containernet",
   "type" : "bridge",
   "mtu" : 8973,
   "ipMasq" : false,
   "isGateway" : true,
   "ipam" : {
      "type" : "host-local",
      "subnet" : "",
      "routes" : [ { "dst" : "" } ]

It then executes the "bridge" plugin, which does the following:

  • creates a linux-bridge on the host
  • executes the "host-local" plugin to get an IP for both the container and the bridge (gateway) within
  • assigns an IP to the bridge
  • creates a veth pair
  • plugs one end of the veth pair into the bridge
  • moves the other end of the veth into the container and assign it an IP
  • ensures that MTU on both the bridge and the veths is 8973 (to match flannel)

The above flow illustrates a key design decision of CNI: a plugin gets full control over both the host and container networking namespaces and is expected to do everything to connect the container to the network. It is, however, encouraged to delegate some of the work to other plugins. Giving the plugins complete control over the namespaces provides for the most flexibility. It allows plugin writers to better integrate their networking solutions with rkt and other CNI compatible container runtimes.

If you'd like to learn more about CNI plugins and rkt, please join us at our next CoreOS Meetup in San Francisco on Monday, September 21.

September 18, 2015

Official Kubernetes on CoreOS Guides and Tools

Today we are releasing the first set of official Kubernetes on CoreOS guides and installation tools. This is in effort to make it even easier to get up and running with Kubernetes while experiencing the benefits of running on CoreOS Linux. The guides are actively maintained by the CoreOS team and are subject to regular Kubernetes conformance testing.

Deployment Guide

You can find the full deployment guide within the CoreOS documentation. We have also included basic Kubernetes usage instructions, including guides on pods, replication controllers, and services.

Vagrant Installer

In addition to guides, we are releasing a single node and multi-node Vagrant installation guide. The Vagrant installer is intended to be used as an SDK for your development environment. Even more, when used with the deployment guide, it will be consistent with your production setup.

Officially Supported via Tectonic

Our commercial product based on Kubernetes, Tectonic, officially supports these guides. If you’re interested in Tectonic by CoreOS please sign up for the Tectonic Preview.

Please Contribute!

All guides and tools are available on GitHub. Please contribute or file issues with the guides and tools that you would like to see next.

September 16, 2015

Where systemd and Containers Meet: Q&A with Lennart Poettering

We talked with Lennart Poettering, creator of the systemd project, to hear the story of the origins of systemd, how it works in the world of containers and what can be expected in the future.

CoreOS will be at the systemd.conf in Germany this November. CTO Brandon Philips and software developers Alex Crawford and Jonathan Boulle will all speak on various aspects of systemd and containers. Meet us and Lennart there!

Q1: Tell us about your background and inspiration that led to the creation of the systemd project.

A: That's a long story!

I am a software engineer at Red Hat. Before working on systemd my focus was Linux audio, where I created the PulseAudio project. I also created the Avahi service discovery framework, even before that.

Five years ago, up to the point when Kay Sievers and I started working on the systemd project, we were actually big believers in the Upstart project. Upstart was a system manager ("init") written by an engineer at Canonical that modernized how Linux systems booted up. It was strictly event-based and had a very clean codebase. It was adopted by a number of distributions including Ubuntu and Fedora, back then.

When we looked closer at Upstart we eventually realized that its fundamental design was backwards – at least in our opinion. We thought a system manager should calculate the minimal amount of work to do during boot-up, while Upstart was actually (in a way) designed to do the maximum amount of work, and left the developers and administrators in charge to calculate what precisely should be done when. We understood that fixing this was not really an option within the Upstart project, since it would mean turning Upstart completely upside down, and throwing away its overall design, and replacing it with a completely new design. Hence we figured: this is one of the occasions where one needs to start from scratch – and so we did: with the systemd project, implementing our ideas how Linux system boot-up should work.

Since then, we have been developing systemd steadily, taking a lot of inspiration from the system managers of other operating systems, in particular Solaris' SMF framework, as well as MacOS X's launchd system. Originally systemd was supposed to be just an "init" system, i.e. just one process that is responsible for the most fundamental logic of starting up your Linux userspace. However, we eventually realized that we actually wanted to solve more than just this one problem, and began to rework and unify a larger part of how Linux userspace is brought up and maintained during runtime.

Today, we consider systemd a set of basic building blocks to build an OS from, that contains far more than just an init system, but also covers device management, login management, network management, logging services and a lot more.

Q2: You spoke about systemd at the Core of the OS at CoreOS Fest this year. What does systemd have to offer to containers?

A: We believe a modern system and service manager should natively know the concept of a container. Containers should be a central facet of server management and the concept of it should transcend the layers of the OS stack, all the way from the application layer down to the kernel. As the glue between that all, it's a responsibility for systemd to integrate containers into the OS.

Specifically, in systemd many commands are directly aware of the container concept. They not only can show you information about what's running and going on on the host, and change state of it, but also of all local containers. For example, the logging tool journalctl may be used to show you the logs of the host, or of any local container, and can even interleave the logs of all containers and the host into one stream. Other commands that have direct support for containers are systemctl, loginctl, systemd-run, and many more.

For us, systemd should not only be useful to run containers on, but also work fine when run inside containers. In fact, we regularly test systemd in containers, even more often than on bare metal, simply because it is so much easier and quicker to work with containers than physical machines. Containers matter to us upstream, they are our primary testing platform.

systemd also contains the systemd-nspawn container manager. It's a relatively minimal, yet powerful implementation of a container manager. Initially we wrote it for testing purposes, but nowadays we consider it ready for many production uses. In fact CoreOS' rkt container tool makes use of it as the lower level container backend.

Watch Lennart’s talk from CoreOS Fest

For a deeper dive, watch the full recording of Lennart's talk:

Q3: What are the most important things emerging in the Linux ecosystem?

A: The major trend I am seeing is that the OS becomes commoditized, and is turned into something that is managed increasingly automatically, often as a commercial service, the way CoreOS is doing it. The requirement for automating OS management starts to be reflected in many layers of the stack: OS updates become automatic, the system thus needs to handle failure more gracefully, in order to support automatic rollback. The OS is split into containers, where each container becomes very similar in behaviour and update cycle to the OS itself. Traditional software packages move out of the deployment focus, and become strictly a development tool: containers and OSs are now the smallest unit of deployment.

If you put all this together it is becoming easier to run, deploy and update larger setups, as much of the necessary work is now done automatically, more robustly, and without requiring direct interference of the administrator.

And yes, with Linux we are at the forefront of this development. And with systemd we hope to get some basic building blocks for all this into place.

Q4: How do you see systemd integrating with container tools like Docker and rkt?

A: Integration with rkt is already pretty close, as rkt uses systemd-nspawn as its container backend. And we love it that way. We want to deliver the basic building blocks for container management, with a strict focus on the individual machine. Tools like rkt then build on this, and build a more distributed, network-aware, user-friendly "house" from our "building blocks."

Q5: What should attendees expect at systemd.conf coming up this November in Berlin?

A: As systemd is now at the core of so many Linux operating systems, systemd.conf will of course be one of the major forums where we will discuss the goals, progress and future of the basic Linux userspace.

The conference is intended for system developers as well as professional devops folks. We'll have talks on a lot of different topics, such as containers, networking, logging, IPC, systemd in distributions, use of systemd in the cloud, on embedded and on servers, and a lot more.

systemd.conf 2015 will consist of at least 1.5 days of presentations followed by an extended hackfest. And of course, there will be parties in one of today's most exciting cities, where you can meet many other people involved with and interested in the systemd project!

I hope to see you in Berlin!

Thanks to Lennart for chatting with us!

September 15, 2015

Returning to data and Gov 2.0 from the DTO

I have been working at the newly created Digital Transformation Office in the Federal Government since January this year helping to set it up, create a vision, get some good people in and build some stuff. I was working in and then running a small, highly skilled and awesome team focused on how to dramatically improve information (websites) and transaction services across government. This included a bunch of cool ideas around whole of government service analytics, building a discovery layer (read APIs) for all government data, content and services, working with agencies to improve content and SEO, working on reporting mechanisms for the DTO, and looking at ways to usefully reduce the huge number of websites currently run by the Federal public service amongst other things. You can see some of our team blog posts about this work.

It has been an awesome trip and we built some great stuff, but now I need to return to my work on data, gov 2.0 and supporting the Australian Government CTO John Sheridan in looking at whole of government technology, procurement and common platforms. I can also work more closely with Sharyn Clarkson and the Online Services Branch on the range of whole of government platforms and solutions they run today, particularly the highly popular GovCMS. It has been a difficult choice but basically it came down to where my skills and efforts are best placed at this point in time. Plus I miss working on open data!

I wanted to say a final public thank you to everyone I worked with at the DTO, past and present. It has been a genuine privilege to work in the diverse teams and leadership from across over 20 agencies in the one team! It gave me a lot of insight to the different cultures, capabilities and assumptions in different departments, and I think we all challenged each other and created a bigger and better vision for the effort. I have learned much and enjoyed the collaborative nature of the broader DTO team.

I believe the DTO has two major opportunities ahead: as a a force of awesome and a catalyst for change. As a force of awesome, the DTO can show how delivery and service design can be done with modern tools and methods, can provide a safe sandpit for experimentation, can set the baseline for the whole APS through the digital service standard, and can support genuine culture change across the APS through training, guidance and provision of expertise/advisers in agencies. As a catalyst for change, the DTO can support the many, many people across the APS who want transformation, who want to do things better, and who can be further empowered, armed and supported to do just that through the work of the DTO. Building stronger relationships across the public services of Australia will be critical to this broader cultural change and evolution to modern technologies and methodologies.

I continue to support the efforts of the DTO and the broader digital transformation agenda and I wish Paul Shetler and the whole team good luck with an ambitious and inspiring vision for the future. If we could all make an approach that was data/evidence driven, user centric, mashable/modular, collaborative and cross government(s) the norm, we would overcome the natural silos of government, we would establish the truly collaborative public service we all crave and we would be better able to support the community. I have long believed that the path of technical integrity is the most important guiding principle of everything I do, and I will continue to contribute to the broader discussions about “digital transformation” in government.

Stay tuned for updates on the blog, and I look forward to spending the next 4 months kicking a few goals before I go on maternity leave :)

September 11, 2015

Running a Shell in a Daemon Domain

allow unconfined_t logrotate_t:process transition;

allow logrotate_t { shell_exec_t bin_t }:file entrypoint;

allow logrotate_t unconfined_t:fd use;

allow logrotate_t unconfined_t:process sigchld;

I recently had a problem with SE Linux policy related to logrotate. To test it out I decided to run a shell in the domain logrotate_t to interactively perform some of the operations that logrotate performs when run from cron. I used the above policy to allow unconfined_t (the default domain for a sysadmin shell) to enter the daemon domain.

Then I used the command “runcon -r system_r -t logrotate_t bash” to run a shell in the domain logrotate_t. The utility runcon will attempt to run a program in any SE Linux context you specify, but to succeed the system has to be in permissive mode or you need policy to permit it. I could have written policy to allow the logrotate_t domain to be in the role unconfined_r but it was easier to just use runcon to change roles.

Then I had a shell in the logrotate_t command to test out the post-rotate scripts. It turned out that I didn’t really need to do this (I had misread the output of an earlier sesearch command). But this technique can be used for debugging other SE Linux related problems so it seemed worth blogging about.

September 10, 2015

etcd 2.2 – Improving the Developer Experience and Setting the Path for the v3 API

Today we are releasing etcd 2.2. This new release focuses on improving the tooling and developer experience. This release introduces an experimental demo of the next-generation v3 API, a new Go etcd client, and active cluster connectivity checking.

etcd is an open source, distributed, consistent key-value store for shared configuration, service discovery and scheduler coordination. etcd is a core component of the CoreOS software stack and is used to facilitate safe automatic updates in CoreOS Linux. It is the primary scheduling and service discovery data store in Kubernetes and is leveraged by hundreds of other tools and projects.

If you want to skip the talk and get right to the code, you can find new binaries on GitHub.

Zero-downtime rolling upgrade from 2.1

Upgrading from etcd 2.1 to etcd 2.2 is a zero-downtime rolling upgrade. You can update a cluster's nodes, one-by-one, from etcd 2.1 to etcd 2.2. For more details, please read the upgrade documentation. If you are running your cluster under etcd 0.4.x, please follow the snapshot migration documentation to upgrade your cluster.

Also, with this release, etcd 2.2 is now the current stable etcd release. As such, all bug fixes will go into new etcd 2.2.x releases and won't be backported to etcd 2.1.x.

New etcd client

Today we added a new Go etcd client binding to replace the old go-etcd client. The new client provides a clean set of APIs and improves the functionality around request cancellation and error reporting. We encourage you to try it out for new projects and provide feedback on the API via the etcd-dev mailing list. In this release, etcdctl also uses this new client, which will improve its reliability.

Experimental v3 API and new storage demo

We recently proposed a new version of the etcd API that is focused on providing improved features to the key-value store including: range reads, multi-key transactions, binary keys/values and a longer, more reliable key change history.

This new API also provides an efficient, reliable and scalable way to access the new disk storage backend. The new disk storage backend allows etcd to do incremental snapshots and decreases the memory pressure of large datasets while retaining hot data in-memory for fast access. The new storage backend, combined with the v3 API, will provide even better performance and stability than today’s in-memory store, which supports the v2 API.

etcd 2.2 supports a subset of the v3 API for testing and demo purposes. Note that this early preview is a non-clustered version and should not be used in production. You can enable it by setting the --experimental-v3demo flag, then build and use the etcdctlv3 tool in the etcd repo to interact with it.

We plan to support an experimental clustered version of the v3 API in the next etcd minor release, etcd v2.3.0, which is currently targeted for release in the next couple of months.

Active cluster health checking

In previous releases of etcd, the leader of the cluster was solely responsible for monitoring the health of other members. With this new release, all members of an etcd cluster now regularly check for connectivity and timing differences to other members to ensure cluster-wide health. To ease debugging potential issues, etcd now reports members that are observed as unhealthy every 30 seconds in the logs. This new feature also helps users explore and understand the stability of their cluster. For instance, it has the ability to expose clock sync issues and potential issues that might be caused by partial network partitions between members.

Improved documentation

Based on feedback from the community, we've made a number of improvements to the readability and understandability of our documentation. Moreover, we added important new documents:

Get involved with etcd

We thank the community for helping to make etcd a strong and fundamental building block for running Google-like infrastructure. As we continue to invest in etcd and build requested features, we also welcome your contributions!

One week after, SAMBRO Users asking for more of Sahana

“You guys have put in a lot of thought in to the SAMBRO design“, a participant said. The first week of the Training of Trainer program had the participants learn about GIS and Sahana. They went further into learning about [Read the Rest...]

September 08, 2015

September Events for CoreOS: Conferences, Trainings and More

September is here and we are everywhere – from the West Coast to the Midwest in the US, all the way to Amsterdam and London. We started out this month at VMworld.

If you missed us at VMworld last week, watch this panel hosted by Brian Gracely, analyst with Wikibon, on theCUBE with Brandon Philips (@brandonphilips) from CoreOS, Sheng Liang (@shengliang) from Rancher Labs and Nick Weaver (@lynxbat) from Intel about containers in the enterprise.

Check out what the rest of this month has in store for us!

Wednesday, September 9, 2015 at 10:00 a.m. CEST – Amsterdam, Netherlands

Kelsey Hightower (@kelseyhightower), product manager, developer and chief advocate at CoreOS, will be leading a Kubernetes workshop at the Impact Hub in Amsterdam!

Friday, September 11, 2015 at 2:30 p.m. CEST – Amsterdam, Netherlands

Don’t miss Kelsey Hightower (@kelseyhightower) keynote at Software Circus where he’ll be discussing managing applications at scale.

Tuesday, September 15, 2015 at 6:30 p.m. BST – London, United Kingdom

You’ll find Barak Michener (@barakmich) at the [ Contain ] Meetup in London alongside Kai Davenport (@kai_davenport) from ClusterHQ and Andrew Kennedy (@grkvlt), founder of the Clocker project.

Friday, September 18, 2015 at 1:00 p.m. BST – London, United Kingdom

Barak Michener (@barakmich) will be at GOTO London, speaking on the rugged track and discussing scaling open source projects from 0-1,000 commits.

Friday, September 18, 2015 at 11:30 a.m. PDT – Chicago, IL

If you are planning on attending WindyCityRails, join Kelsey Hightower (@kelseyhightower) for his talk on Ruby on Kubernetes. In this session, attendees will learn how to package Ruby applications in Docker containers to streamline the packaging and distribution problem inherent to all web applications.

Tuesday, September 22, 2015 at 10 a.m. PDT - San Francisco, CA

Container Summit San Francisco is not to miss! Brandon Philips (@brandonphilips), CTO of CoreOS, will be speaking on the theme of why containers are ready for the enterprise and what technical leadership can expect in the future to achieve organizational agility. Get your tickets.

Wednesday, September 23, 2015 – Burlingame, CA

Join Brandon Philips (@brandonphilips), CTO of CoreOS, for a talk at Linaro Connect. More details will be posted soon.

Wednesday, September 23, 2015 at 1:30 p.m. PDT – San Francisco, CA

Kelsey Hightower (@kelseyhightower) will be giving a talk on bringing Kubernetes to the edge with NGINX Plus at the NGINX Summit. In this session you will learn how NGINX Plus can be used to provide robust load balancing across a Kubernetes cluster while leveraging deep integration with the Kubernetes API and built-in service discovery mechanisms. If you can't make it in person, stay tuned for the recording of a webinar on this topic that will post at a later date.

Friday, September 25, 2015 at 10:10 a.m. PDT – St. Louis, MO

Learn how to manage applications at scale with Kelsey Hightower at Strange Loop.

Monday, September 28, 2015 at 10:00 a.m. PDT – Portland, OR

Want more of Kelsey Hightower? HashiConf has got you covered. Learn how to manage applications at scale, from theory to production.

Wednesday, September 30, 2015 at 10:00 a.m. PDT – Portland, OR

Last but not least, join us for a Kubernetes workshop lead by Kelsey Hightower. Space is limited. Reserve your seat here!

Interested in hosting your own meetup or want to learn more about getting involved with the CoreOS Community? Email us at

September 05, 2015

A Long Term Review of Android Devices

Xperia X10

My first Android device was The Sony Ericsson Xperia X10i [1]. One of the reasons I chose it was for the large 4″ screen, nowadays the desirable phones (the ones that are marketed as premium products) are all bigger than that (the Galaxy S6 is 5.1″) and even the slightly less expensive phones are bigger. At the moment Aldi is advertising an Android phone with a 4.5″ screen for $129. But at the time there was nothing better in the price range that I was willing to pay.

I devoted a lot of my first review to the default apps for SMS and Email. Shortly after that I realised that the default email app is never going to be adequate (I now use K9 mail) and the SMS app is barely adequate (but I mostly use instant messaging). I’ve got used to the fact that most apps that ship with an Android device are worthless, the camera app and the app to make calls are the only built in apps I regularly use nowadays.

In the bug list from my first review the major issue was lack of Wifi tethering which was fixed by an update to Android 2.3. Unfortunately Android 2.3 ran significantly more slowly which decreased the utility of the phone.

The construction of the phone is very good. Over the last 2 years the 2 Xperia X10 phones I own have been on loan to various relatives, many of whom aren’t really into technology and can’t be expected to take good care of things. But they have not failed in any way. Apart from buying new batteries there has been no hardware failure in either phone. While 2 is a small sample size I haven’t see any other Android device last nearly as long without problems. Unfortunately I have no reason to believe that Sony has continued to design devices as well.

The Xperia X10 phones crash more often than most Android phones with spontaneous reboots being a daily occurrence. While that is worse than any other Android device I’ve used it’s not much worse.

My second review of the Xperia X10 had a section about ways of reducing battery use [2]. Wow, I’d forgotten how much that sucked! When I was last using the Xperia X10 the Life360 app that my wife and I use to track each other was taking 15% of the battery, on more recent phones the same app takes about 2%. The design of modern phones seems to be significantly more energy efficient for background tasks and the larger brighter displays use more energy instead.

My father is using one of the Xperia phones now, when I give him a better phone to replace it I will have both as emergency Wifi access points. They aren’t useful for much else nowadays.

Samsung Galaxy S

In my first review of the Galaxy S I criticised it for being thin, oddly shaped, and slippery [3]. After using it for a while I found the shape convenient as I could easily determine the bottom of the phone in my pocket and hold it the right way up before looking at it. This is a good feature for a phone that’s small enough to rotate in my pocket – the Samsung Galaxy Note series of phones is large enough to not rotate in a pocket. In retrospect I think that being slippery isn’t a big deal as almost everyone buys a phone case anyway. But it would still be better for use on a desk if the bulge was at the top.

I wrote about my Galaxy S failing [4]. Two of my relatives had problems with those phones too. Including a warranty replacement I’ve seen 4 of those phones in use and only one worked reliably. The one that worked reliably is now being used by my mother, it’s considerably faster than the Xperia X10 because it has more RAM and will probably remain in regular use until it breaks.


I tried using CyanogenMod [5]. The phone became defective 9 months later so even though CyanogenMod is great I don’t think I got good value for the amount of time spent installing it. I haven’t tried replacing the OS of an Android phone since then.

I really wish that they would start manufacturing phones that can have the OS replaced as easily as a PC.

Samsung Galaxy S3 and Wireless Charging

The Galaxy S3 was the first phone I owned which competes with phones that are currently on sale [6]. A relative bought one at the same time as me and her phone is running well with no problems. But my S3 had some damage to it’s USB port which means that the vast majority of USB cables don’t charge it (only Samsung cables can be expected to work).

After I bought the S3 I bought a Qi wireless phone charging device [7]. One of the reasons for buying that is so if a phone gets a broken USB port then I can still use it. It’s ironic that the one phone that had a damaged USB port also failed to work correctly with the Qi card installed.

The Qi charger is gathering dust.

One significant benefit of the S3 (and most Samsung phones) is that it has a SD socket. I installed a 32G SD card in the S3 and now one of my relatives is happily using it as a media player.

Nexus 4

I bought a Nexus 4 [8] for my wife as she needed a better phone but didn’t feel like paying for a Galaxy S3. The Nexus 4 is a nice phone in many ways but the lack of storage is a serious problem. At the moment I’m only keeping it to use with Google Cardboard, I will lend it to my parents soon.

In retrospect I made a mistake buying the Nexus 4. If I had spent a little more money on another Galaxy S3 then I would have had a phone with a longer usage life as well as being able to swap accessories with my wife.

The Nexus 4 seems reasonably solid, the back of the case (which is glass) broke on mine after a significant impact but the phone continues to work well. That’s a tribute to the construction of the phone and also the Ringke Fusion case [9].

Generally the Nexus 4 is a good phone so I don’t regret buying it. I just think that the Galaxy S3 was a better choice.

Galaxy Note 2

I got a Samsung Galaxy Note 2 in mid 2013 [10]. In retrospect it was a mistake to buy the Galaxy S3, the Note series is better suited to my use. If I had known how good it is to have a larger phone I’d have bought the original Galaxy Note when it was first released.

Generally everything is good about the Note 2. While it only has 16G of storage (which isn’t much by today’s standards) it has an SD socket to allow expansion. It’s currently being used by a relative as a small tablet. With a 32G SD card it can fit a lot of movies.

Bluetooth Speakers

I received Bluetooth speakers in late 2013 [11]. I was very impressed by them but ended up not using them for a while. After they gathered dust for about a year I started using them again recently. While nothing has changed regarding my review of the Hive speakers (which I still like a lot) it seems that my need for such things isn’t as great as I thought. One thing that made me start using the Bluetooth speakers again is that my phone case blocks the sound from my latest phone and makes it worse than phone sound usually is.

I bought Bluetooth speakers for some relatives as presents, the relatives seemed to appreciate them but I wonder how much they actually use them.

Nexus 5

The Nexus 5 [12] is a nice phone. When I first reviewed it there were serious problems with overheating when playing Ingress. I haven’t noticed such problems recently so I think that an update to Android might have made it more energy efficient. In that review I was very impressed by the FullHD screen and it made me want a Note 3, at the time I planned to get a Note 3 in the second half of 2014 (which I did).

Galaxy Note 3

Almost a year ago I bought the Samsung Galaxy Note 3 [13]. I’m quite happy with it at the moment but I don’t have enough data for a long term review of it. The only thing to note so far is that in my first review I was unhappy with the USB 3 socket as that made it more difficult to connect a USB cable in the dark. I’ve got used to the socket and I can now reliably plug it in at night with ease.

I wrote about Rivers jeans being the only brand that can fit a Samsung Galaxy Note series phone in the pocket [14]. The pockets of my jeans have just started wearing out and I think that it’s partly due to the fact that I bought a Armourdillo Hybrid case [15] for my Note 3. I’ve had the jeans for over 3 years with no noticable wear apart from the pockets starting to wear out after 10 months of using the Armourdillo case.

I don’t think that the Armourdillo case is bad, but the fact that it has deep grooves and hard plastic causes it to rub more on material when I take the phone out of my pocket. As I check my phone very frequently this causes some serious wear. This isn’t necessarily a problem given that a phone costs 20* more than a pair of jeans, if the case was actually needed to save the phone then it would be worth having some jeans wear out. But I don’t think I need more protection than a gel case offers.

Another problem is that the Armourdillo case is very difficult to remove. This isn’t a problem if you don’t need access to your phone, IE if you use a phone like the Nexus 5 that doesn’t permit changing batteries or SD cards. But if you need to change batteries, SD cards, etc then it’s really annoying. My wife seems quite happy with her Armoudillo case but I don’t think it was a good choice for me. I’m considering abandoning it and getting one of the cheap gel cases.

The sound on the Note 3 is awful. I don’t know how much of that is due to a limitation in the speaker and how much is due to the case. It’s quite OK for phone calls but not much good for music.


I’m currently on my third tablet. One was too cheap and nasty so I returned it. Another was still cheap and I hardly ever used it. The third is a Galaxy Note 10 which works really well. I guess the lesson is to buy something worthwhile so you can use it. A tablet that’s slower and has less storage than a phone probably isn’t going to get used much.

Phone Longevity

I owned the Xperia X10 for 22 months before getting the Galaxy S3. As that included 9 months of using a Galaxy S I only had 13 months of use out of that phone before lending it to other people.

The Galaxy S3 turned out to be a mistake as I replaced it in only 7 months.

I had the Note 2 for 15 months before getting the Note 3.

I have now had the Note 3 for 11 months and have no plans for a replacement any time soon – this is the longest I’ve owned an Android phone and been totally satisfied with it. Also I only need to use it for another 4 months to set a record for using an Android phone.

The Xperia was “free” as part of a telco contract. The other phones were somewhere between $500 and $600 each when counting the accessories (case, battery, etc) that I bought with them. So in 4 years and 7 months I’ve spent somewhere between $1500 and $1800 on phones plus the cost of the Xperia that was built in to the contract. The Xperia probably cost about the same so I’ll assume that I spent $2000 on phones and accessories. This seems like a lot. However that averages out to about $1.20 per day (and hopefully a lot less if my Note 3 lasts another couple of years). I could justify $1.20 per day for either the amount of paid work I do on Android phones or the amount of recreational activities that I perform (the Galaxy S3 was largely purchased for Ingress).


I think that phone companies will be struggling to maintain sales of high end phones in the future. When I chose the Xperia X10 I knew I was making a compromise, the screen resolution was an obvious limitation on the use of the device (even though it was one of the best devices available). The storage in the Xperia was also a limitation. Now FullHD is the minimum resolution for any sort of high-end device and 32G of storage is small. I think that most people would struggle to observe any improvement over a Nexus 5 or Note 3 at this time. I think that this explains the massive advertising campaign for the Galaxy S6 that is going on at the moment. Samsung can’t sell the S6 based on it being better than previous phones because there’s not much that they can do to make it obviously better. So they try and sell it for the image.

September 04, 2015

Few weeks after the workshop Myanmar faces massive floods

Two weeks after the ‘CAP on a Map‘ project kick-off workshop, the Department of Meteorology and Hydrology got busy responding to the massive floods - “Heavy seasonal rains caused flooding in Rakhine State and other parts of the country at [Read the Rest...]

September 03, 2015

Announcing dex, an Open Source OpenID Connect Identity Provider from CoreOS

Today we are pleased to announce a new CoreOS open source project called dex: a standards-based identity provider and authentication solution.

Just about every project requires some sort of authentication and user-management. Applications need a way for users to log-in securely from a variety of platforms such as web, mobile, CLI tools and automated systems. Developers typically use a platform-dependent solution or, just as often, find existing solutions don't quite address their needs and so they resort to writing their own solution from scratch.

Most developers are not in the security business, however. Having to write their own authentication software is not only an annoying distraction from their core product, but it can be downright dangerous as well. Doing security correctly is tricky, as we’ve seen with the many recent high-profile breaches, and doing it in a vacuum without proper auditing by other engineers and security experts is even more risky.

For these reasons, we have decided to open source dex so that others may benefit from the work we’ve done to make dex a secure and robust platform. Now available to the community, dex in turn will benefit from having more stakeholders. No one will ever have to write their own "Forgot your password?" flow, or “Login with X, Y or Z” feature again.

The project is named 'dex' because it is a central index of users that other pieces of software can authenticate against.

Key Design Elements

What makes dex unique is the combination of the following elements, which has driven the design and implementation from the beginning.


First and foremost is security: dex is designed using security and encryption best practices that minimize the risk of an attacker gaining access to the system. Furthermore, the dex architecture is compartmentalized to mitigate the damage that any single attack could incur. For example, dex defaults to short token lifetimes and rotates its signing keys automatically. Since the keys themselves are encrypted at rest, an attacker would need to compromise both the database and a dex worker within a short time in order to forge a token.


dex is an implementation of the OpenID Connect (OIDC) Core spec. OIDC (not to be confused with OpenID) was created in partnership with a wide variety of industry leaders and security experts, building on years of experience in web security. It is a layer on top of OAuth2, and as such provides a secure, easy-to-implement protocol for authentication. Today OIDC is used as the single sign-on approach for internet giants like Google, Facebook or Amazon.

Language/Platform Agnostic

Because dex implements the OpenID Connect (OIDC) Core spec, it is easy to integrate dex into your application. The only step is to add an OIDC client library in your language of choice. We’ve written one in Go called go-oidc; others exist in almost every language (be sure to vet any client libraries to ensure proper signature validation and spec compliance).

Identity Federation

dex has its own concept of users, but it allows them to authenticate in different ways, called connectors. Right now, dex ships with two types of connectors: the local connector and the OIDC connector. When authenticating with the local connector, users log-in with an email and password with a customizable UI provided by dex itself. With the OIDC connector, users authenticate by logging into another OIDC Identity Provider, like Google or Salesforce.

Since dex itself is an OIDC Identity Provider it is even possible to chain multiple dex instances together, each delegating authentication to the next in line.

Currently users must choose between connectors, but in the future we plan allow for the linking of identities, so any individual user can log-in in a variety of ways. The extensible connector architecture will allow for integrations with providers like GitHub, LDAP and SAML systems.

Case Study:

One way we are using dex at CoreOS is to register and authenticate customers of Tectonic. When a user first decides to become a Tectonic customer and clicks the “Join” button, they are taken to, which is the “issuer URL” in OpenID Connect parlance. They are asked to register either by using their Google identity or entering in a name and password. After this they are redirected back to the main site where they can complete their signup.

Below is a diagram outlining the deployment:

dex Infrastructure Diagram

dex Infrastructure Diagram

Behind our firewall, we have several components:

  • a postgres database serving as dex’s backing store
  • a single dex-overlord, responsible for rotating keys and other administrative tasks
  • several dex-workers, which provide the front-end for end-user authentication
  • our product site,

In OIDC the Relying Party (RP) – in this case, our product site – exchanges an Authorization Token (obtained from the end-user, the Tectonic customer) for an ID token, with the Identity Provider (IdP), which is dex. Note that although we have our application and dex co-located behind the same firewall, it is not necessary. They communicate with each other over the public Internet via a TLS connection; this can be useful if you have a variety of applications in different hosting environments all needing authentication.

When a user chooses to authenticate using their Google account, dex temporarily becomes the RP and Google becomes the IdP for the purposes of authenticating and identifying the user. Once dex has done this (via the same token exchange protocol mentioned above), dex goes back to being the IdP and completes the token exchange with

Throughout the process tokens are cryptographically signed, and signatures verified by the clients. Signing keys are constantly rotated by the IdPs and synced by the RPs.

Future Plans with dex

dex is usable right now, but there’s still a lot of work to do. Aside from the open issues on GitHub, things that are on the roadmap for dex include:

  • Authorization – In addition to dex handling authentication, we’d like it to be a general purpose authorization server as well.
  • User Management – We are in the beginning stages of developing an API for admins to manage users, but soon it will be more complete and come with UI as well.
  • Multiple Remote Identities – As mentioned above, users will be able to authenticate using more than one authentication method.
  • Additional Connector types – Eg., LDAP and GitHub

dex is still quite young, and there’s a lot of work to do going forward, so if you’re interested, we’d love to have your help!

Training of SAMBRO Trainers

Sahana Alerting and Messaging Broker (SAMBRO) continues to mature; especially with the Maldives, Myanmar, and Philippine implementations. Trainees from the three countries belonging to their Meteorological and Disaster Management Agencies are receiving training. They will receive training on GIS concepts, [Read the Rest...]

Stupid RCU Tricks: Hand-over-hand traversal of linked list using SRCU

Suppose that a very long linked list was to be protected with SRCU. Let's also make the presumably unreasonable assumption that this list is so long that we don't want to stay in a single SRCU read-side critical section for the whole traversal.

So why not try hand-over-hand SRCU protection, as shown in the following code fragment?

  1 struct foo {
  2   struct list_head list;
  3   ...
  4 };
  6 LIST_HEAD(mylist);
  7 struct srcu_struct mysrcu;
  9 void process(void)
 10 {
 11   int i1, i2;
 12   struct foo *p;
 14   i1 = srcu_read_lock(&mysrcu);
 15   list_for_each_entry_rcu(p, &mylist, list) {
 16     do_something_with(p);
 17     i2 = srcu_read_lock(&mysrcu);
 18     srcu_read_unlock(&mysrcu, i1);
 19     i1 = i2;
 20   }
 21   srcu_read_unlock(&mysrcu, i1);
 22 }

The trick is that on each pass through the loop, we enter a new SRCU read-side critical section, then exit the old one. That way the entire traversal is protected by SRCU, but each SRCU read-side critical section is quite short, covering traversal of but a single element of the list.

As is customary with SRCU, the list is manipulated using list_add_rcu(), list_del_rcu, and friends.

What are the advantages and disadvantages of this hand-over-hand SRCU list traversal?

September 01, 2015

Flocker on CoreOS Linux

You are now able to use Flocker, an open-source container data volume manager for containerized applications, on CoreOS Linux. This brings the benefits of data management and portability from Flocker together with the lightweight, painless, automatic security updates from running on CoreOS Linux.

Flocker works with Amazon Elastic Block Store (EBS) on CoreOS Linux with the Flocker Docker plugin. Note it is in experimental stages through ClusterHQ Labs and the team will work towards official support in the future.

Read more about how to get started with Flocker + CoreOS on ClusterHQ’s blog.

If you are at VMworld in San Francisco this week, stop by at 12:20 p.m. PT today to see a lunch panel in which we'll participate with ClusterHQ about making containers work in the enterprise.

August 25, 2015

Containers on the Autobahn: Q&A with Giant Swarm

Timo Derstappen (@teemow), co-founder from Giant Swarm has joined us for various events in the past, but you may recall seeing his talk at CoreOS Fest this year (embedded below). We sat down with him to see what Giant Swarm is up to and how Giant Swarm uses CoreOS for their microservice infrastructure.

Q1. Explain what Giant Swarm delivers and what inspired you to co-found the company.

A: Giant Swarm is a Microservice Infrastructure. Given the fact that the term Microservices is used by a lot of people lately, I’m going to explain this a little bit more.

At my last company we grew pretty quickly and after a rush of feature implementations we stood there with a monolithic app that now had to be scaled. We looked closely at the different requirements within the stack and decided that we would prefer to choose the right tool for each job. This was the complete opposite to what we did before, where we had one techstack and tried to solve everything with it. By isolating problems in small services we were able to scale in many dimensions. Teams weren’t blocking each other, services could be scaled independently, and we iterated faster. It was also very expensive in terms of automating the infrastructure to run the zoo of technologies we were suddenly using. 20-30% of the developers were always blocked by automating the infrastructure. After leaving the company we took some time off and I worked on a next generation platform I wanted to use for our next idea. Wherever I demoed that, people either wanted to have that too or work at our company.

So the infrastructure itself became the next idea. We now run that infrastructure for many developers, first customers are going into production with their own dedicated clusters, and we now also offer on-prem installations.

Q2. How does Giant Swarm fit into the world of distributed systems and containers?

A: Giant Swarm builds a layer on top of containers and enables developers to declare and manage their microservices without thinking about servers. We map their software architecture onto a container infrastructure distributed across many servers. Our product clearly addresses developers. We enable them to actually live up to “You build it, you run it” without the hassle to actually learn how to setup a production ready container infrastructure with networking and storage solutions that fit in such a highly dynamic environment.

Q3. Your talk at CoreOS Fest on Containers on the Autobahn discussed what fast means in the world of containers and distributed systems. Explain what is most important to do when looking to develop and deploy application containers in an efficient and fast way.

A: In my talk I actually not only showed how our users can run their services on our platform, but I also showed how we ourselves are dogfooding by running our own services in containers with the same building blocks we are providing our users with. There is a saying that a good manager is a barrier removal professional, and we are thinking the same way about infrastructure. Good infrastructure should allow developers to run at full speed, not being encumbered by roadblocks. For instance, you want to create a new test environment for your service landscape in a couple of seconds instead of waiting for the other team that is currently blocking the test environment.

In general there are many facets of fast, which we are addressing at Giant Swarm. Low latency, Short MTTR, High Throughput. Which leads to another part of the talk. Although Giant Swarm appears to be a PaaS-like solution to start your application in containers with zero-configuration, we provide you with a container infrastructure that is unopinionated. On your own dedicated cluster you can choose if you’d like to run on AWS or bare metal and which networking/storage fits you best. You can run Kubernetes on it, run your own service discovery, continuous integration, monitoring, etc. There is also a benefit that users can share their infrastructure stacks and try out new ones really quickly.

Q4. How does Giant Swarm make use of CoreOS projects, such as CoreOS Linux and rkt, on your microservice infrastructure? Any tips and tricks you’d recommend for others, or areas where readers can get more information?

A: The whole Giant Swarm infrastructure is based on CoreOS. A small stripped down modern Linux with atomic updates was exactly what I was looking for in a production environment. Even better is that CoreOS follows the Unix philosophy and builds small but capable tools. This enables platform builders like us to provide customers with a flexible solution by combining these tools with other building blocks that cater to the customers’ needs. We also excessively use systemd for our container scheduling and management. Something that might be a bit unique to what we do is that we build container “chains” around each application container to keep the configuration out of the actual application container. The concept is similar to pods, but our chains start up in order and you can create blocking dependencies. This uses a distributed lock to wait for a dependency on startup, which allows us to start and stop complex architectures very gracefully.

Currently, we are using Docker containers to not break dev/prod parity for our users, but we very much favor the concept of a container runtime like rkt, based on the same reasons we like the Unix-like approach of CoreOS.

Q5. Designing an application as a series of micro-services has moved from being an emerging technology to an accepted design pattern. Do you have any suggestions on what enterprise organizations can do to speed the adoption of this new pattern?

A: Moving to microservices is not easy and is even harder for a big enterprise with lots of legacy software and more traditional, late adopting technical staff. There are two categories of hurdles to take with microservice architectures.

First are the hurdles that come with any new architectural or software engineering pattern. That includes questions around what actually microservices are, how they should be cut, if they should contain data or not, and all kinds of new ways of thinking that developers might not be used to, yet. However, there’s more and more articles and even books as well as good consultants that help companies understand and move to the microservices way. In the end every enterprise has to design its migration according to its individual needs - for some starting with a monolith and breaking away smaller services might be a good choice, for others the complete rebuild of (parts) of their systems in microservices style. We have seen both with customers going into production.

The second category of hurdles revolves around the (operations) overhead that comes with deploying and managing microservices. Here’s where container technologies, CoreOS, and Giant Swarm come into play as we all are actively working on solutions to make the development and operations part of microservices a simple and hassle free experience. Using tools that make the first steps towards microservices easier for developers as well as operations teams, make it easier to bring the enterprise to this new pattern. These tools should get out of the way of the users and enable them to focus on the actual implementation details of their microservices instead of having to worry about how to run them on different environments.

Thanks to Timo for chatting with us!

Watch his talk that was given at CoreOS Fest this year.

August 24, 2015

Docker on Windows Server Preview TP3 with wifi

Doesn’t work. Especially if, like me, you have a docking station usb 3 ethernet, an on-board ethernet, use wifi on many different access-points, and use your mobile phone for network connectivity.

The Docker daemon is started by running

net start docker

, which runs



In that script, you’ll see the “virtual switch” (

docker daemon -D -b "Virtual Switch"

) is used for networking – and that (at least in my case) appears to be bound to the ethernet I had when I installed.

Same pain point as trying to use Hyper-V VM’s for roaming development.

Uninstalling Hyper-V leaves us in an interesting place:

ending build context to Docker daemon 2.048 kB
Step 0 : FROM windowsservercore
 ---> 0d53944cb84d
Step 1 : RUN @powershell -NoProfile -ExecutionPolicy Bypass -Command "iex ((new-object net.webclient).DownloadString(''))"
 ---> Running in ad8fb58ba732
HCSShim::CreateComputeSystem - Win32 API call returned error r1=3224830464 err=A virtual switch with the given name was not found. id=ad8fb58ba732880aaace7b4e3288212aa9493083848cf0324de310520b523d21 configuration={"SystemType":"Container","Name":"ad8fb58ba732880aaace7b4e3288212aa9493083848cf0324de310520b523d21","Owner":"docker","IsDummy":false,"VolumePath":"\\\\?\\Volume{63828c05-49f4-11e5-89c2-005056c00008}","Devices":[{"DeviceType":"Network","Connection":{"NetworkName":"Virtual Switch","EnableNat":false,"Nat":{"Name":"ContainerNAT","PortBindings":null}},"Settings":null}],"IgnoreFlushesDuringBoot":true,"LayerFolderPath":"C:\\ProgramData\\docker\\windowsfilter\\ad8fb58ba732880aaace7b4e3288212aa9493083848cf0324de310520b523d21","Layers":[{"ID":"f0d4aaa3-c43d-59c1-8ad0-44e6b3381efc","Path":"C:\\ProgramData\\Microsoft\\Windows\\Images\\CN=Microsoft_WindowsServerCore_10.0.10514.0"}]}

looks like the virtual switch made for containers was removed at some point (might have been when I installed Hyper-V, I’m not sure)



returns nothing.

So I installed VMWare Workstation and made a Boot2Docker VM with both NAT and private networking – both vmware based virtual networks continue to work when moving between wifi and ethernet.

So lets see if we can make one in powershell, using the VMWare NAT adaptor (see

PS C:\Users\sven\src\WindowsDocker> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
VMware Network Adapte...8 VMware Virtual Ethernet Adapter for ...      28 Up           00-50-56-C0-00-08       100 Mbps
VMware Network Adapte...1 VMware Virtual Ethernet Adapter for ...      27 Up           00-50-56-C0-00-01       100 Mbps
Wi-Fi                     Intel(R) Dual Band Wireless-AC 7260           4 Disabled     5C-51-4F-BA-12-6F          0 bps
Ethernet                  Intel(R) Ethernet Connection I218-LM          3 Up           28-D2-44-4D-B6-64         1 Gbps

VMWare helpfully provides a Virtual Network editor, so I can see that "Get-NetAdapter  -Name "VMware Network Adapter VMnet8" is the NAT one. I'm not sure if creating a Hyper-V External vswitch will make exclusive use of the adaptor, but if so, we can always create another :)

PS C:\Users\sven\src\WindowsDocker> New-VMSwitch  -Name "VMwareNat" -NetAdapterName "VMware Network Adapter VMnet8" -AllowManagementOS $true -Notes "Use VMnet8 to create a roamable Docker daemon network"

Name      SwitchType NetAdapterInterfaceDescription
----      ---------- ------------------------------
VMwareNat External   VMware Virtual Ethernet Adapter for VMnet8

now to edit the runDockerDaemon.cmd, and restart the Docker Daemon.

FAIL. the docker containers still have no network. At this point, I'm not sure if I've totally broken my Windows Docker networking, hopefully some more playing later will turn up something.

Playing some more, there seems to be a new switchtype Nat - see

So re-running the command they use when installing gets us something new to try:

PS C:\Users\sven\src\WindowsDocker> new-vmswitch -Name nat -SwitchType NAT -NatSubnetAddress ""

Name SwitchType NetAdapterInterfaceDescription
---- ---------- ------------------------------
nat  NAT

PS C:\Users\sven\src\WindowsDocker> Get-VMSwitch

Name      SwitchType NetAdapterInterfaceDescription
----      ---------- ------------------------------
VMwareNat External   VMware Virtual Ethernet Adapter for VMnet8
nat       NAT

it works when the ethernet is plugged in, but not on wifi.

yup - bleeding edge dev :)

[Slashdot] [Digg] [Reddit] [] [Facebook] [Technorati] [Google] [StumbleUpon]

August 21, 2015

What it’s like to Intern with CoreOS

We’ve been very fortunate to have three incredible interns join us for the summer months – Sara and Ahmad at our San Francisco headquarters, and Quentin in our New York City office. Over the last 10 weeks, they’ve not only become integral contributors to our ever-evolving open source projects, but they’ve also become a part of the CoreOS family.

The Intern Program

Interns with CoreOS have the opportunity to work in a fast-paced environment that is shaping the future of infrastructure based on containers and distributed systems. Every intern works closely with a senior level employee that serves as their mentor and project team lead. With their guidance, our interns immediately begin contributing in ways that are not only meaningful to their overall career goals, but that are actively used by the CoreOS community – whether that be through open source or our proprietary products. This unique opportunity allows our interns to receive feedback from their mentors and the greater open source ecosystem. At CoreOS, our interns are regarded as full employees and participate in all company activities, from small team meetings, to all-hands meetings, to off-site adventures.

The 2015 Interns

This year’s interns came with diverse backgrounds and worked on different projects at CoreOS.

  • Ahmad (@Mohd_Ahmad17) is currently pursuing a doctorate in computer science at University of Illinois Urbana-Champaign (UIUC) working on system challenges with a focus on networking. While at CoreOS he worked on flannel, a virtual network for containers. You might recall seeing his blog post that introduced flannel 0.5.0 with AWS and GCE.
  • Quentin studied at Ecole Polytechnique de Tours in France. He is currently working on an independent security project with the team in NYC.
  • Sara (@qpezgi) is completing her bachelor’s degree in electrical and computer engineering at University of Illinois Urbana-Champaign (UIUC). She’s currently working on our OS team where she focuses on the loop device management utility.

Intern Week

We took the opportunity to honor our interns and thank them for all their hard work with the first-ever CoreOS Intern Week!

Quentin traveled to the CoreOS headquarters in SF on Monday morning and festivities were underway almost immediately. We kicked off intern week with a team lunch at one of our favorite local Thai food spots. Food, as it is customary in SF, played a big role in the week’s events. Later that week we also went to a BBQ joint, which has since been dubbed by Quentin as, “the best meal he’s had in America.”

CoreOS 2015 Interns Lunch

CoreOS 2015 Interns Lunch

Eating wasn’t all we did during Intern Week. Tuesday’s trip to the Peninsula and South Bay included a drive through the Google and Apple campuses, followed by an exclusive tour of a state-of-the-art data center. While we shopped for potential cabinet space, Sara, Ahmad and Quentin got to walk among enormous data halls, learn about cutting-edge data center design, and better understand where the world’s data “lives.”

After returning to SF, we decompressed in true CoreOS fashion – outdoor ping pong!

The culminating celebration of Intern Week was spent at the Academy of Sciences on Thursday, for NightLife. After a VIP cocktail hour and tour, we visited exhibits with live animals and attended a show at the planetarium. As a majority of the San Francisco team attended, it was an incredible showing of thanks to the interns for their time at CoreOS!

CoreOS 2015 Interns at Nightlife

Nightlife at the California Academy of Sciences

Could you be a future intern?

Every summer, thousands of students dedicate their time to internships. Many of them have the opportunity to work with big tech companies, like Apple, Google and Amazon. But a few lucky individuals take the path less traveled, and spend their time with a growing company like CoreOS. Our interns are an integral part of our company. They see their impact directly in the work they produce and in the projects to which they contribute. They are supported by their project team leads on a daily basis and form meaningful relationships with us all – including our executive team.

“My favorite thing about interning at CoreOS is the sheer vastness of topics I get to work on. I'm not confined or restricted at all when it comes to how I can contribute, and I’ve found I can help in a lot of ways. For instance, I reproduce user-reported bugs in CoreOS, and I also get to assist open source community members how to use CoreOS products and understand all the use cases of software I’m developing. I get to do literally everything.” - Sara

Are you looking to take the path less traveled? Are you passionate about open source and seeing your work make an impact? Then, reach out to us! Send your resume and cover letter to

Docker on Windows Server 2016 tech preview 3

First thing is to install Windows 2016 – I started in a VM, but I’m rapidly thinking i might try it on my notebook – Windows 10 is getting old already :)

Then goto . Note that the powershell script will download another 3GB.


And now – you can run `docker info` from either cmd.exe, or powershell.

There’s only a limited set of images you can download from Microsoft – `docker search` seems to always reply with the same set:

PS C:\Users\Administrator> docker search anything
microsoft/iis Internet Information Services (IIS) instal... 1 [OK] [OK]
microsoft/dnx-clr .NET Execution Environment (DNX) installed... 1 [OK] [OK]
microsoft/ruby Ruby installed in a Windows Server Contain... 1 [OK]
microsoft/rubyonrails Ruby on Rails installed in a Windows Serve... 1 [OK]
microsoft/python Python installed in a Windows Server Conta... 1 [OK]
microsoft/go Go Programming Language installed in a Win... 1 [OK]
microsoft/mongodb MongoDB installed in a Windows Server Cont... 1 [OK]
microsoft/redis Redis installed in a Windows Server Contai... 1 [OK]
microsoft/sqlite SQLite installed in a Windows Server Conta... 1 [OK]

I downloaded two, and this shows’s they’re re-using the `windowsservercore` image as their common base image:

PS C:\Users\Administrator> docker images -a
microsoft/go latest 33cac80f92ea 2 days ago 10.09 GB
  8daec63ffb52 2 days ago 9.75 GB
  fbab9eccc1e7 2 days ago 9.697 GB
microsoft/dnx-clr latest 156a0b59c5a8 2 days ago 9.712 GB
  28473be483a9 2 days ago 9.707 GB
  56b7e372f76a 2 days ago 9.697 GB
windowsservercore 10.0.10514.0 0d53944cb84d 6 days ago 9.697 GB
windowsservercore latest 0d53944cb84d 6 days ago 9.697 GB

PS C:\Users\Administrator> docker history microsoft/dnx-clr
156a0b59c5a8 2 days ago cmd /S /C setx PATH "%PATH%;C:\dnx-clr-win-x6 5.558 MB
28473be483a9 2 days ago cmd /S /C REM (nop) ADD dir:729777dc7e07ff03f 9.962 MB
56b7e372f76a 2 days ago cmd /S /C REM (nop) LABEL Description=.NET Ex 41.41 kB
0d53944cb84d 6 days ago 9.697 GB
PS C:\Users\Administrator> docker history microsoft/go
33cac80f92ea 2 days ago cmd /S /C C:\build\install.cmd 335 MB
8daec63ffb52 2 days ago cmd /S /C REM (nop) ADD dir:898a4194b45d1cc66 53.7 MB
fbab9eccc1e7 2 days ago cmd /S /C REM (nop) LABEL Description=GO Prog 41.41 kB
0d53944cb84d 6 days ago 9.697 GB

And so the fun begins.

PS C:\Users\Administrator> docker run --rm -it windowsservercore cmd

gives you a containerized shell.

Lets try to build an image that has the chocolatey installer:

FROM windowsservercore

RUN @powershell -NoProfile -ExecutionPolicy Bypass -Command "iex ((new-object net.webclient).DownloadString(''))"

CMD powershell

and then use that image to install…. vim

FROM chocolatey

RUN choco install -y vim

It works!

 docker run --rm -it vim cmd

and then run

C:\Program Files (x86)\vim\vim74\vim.exe

Its not currently usable, I suspect because the ANSI terminal driver is really really new code – but BOOM!

I haven’t worked out how to get the Dockerfile




to work with paths that have spaces – it doesn’t seem to support the array form yet…

I’m going to keep playing, and put the Dockerfiles into

Don’t forget to read the documentation at

[Slashdot] [Digg] [Reddit] [] [Facebook] [Technorati] [Google] [StumbleUpon]

August 19, 2015

The Purpose of a Code of Conduct

On a private mailing list there have been some recent discussions about a Code of Conduct which demonstrate some great misunderstandings. The misunderstandings don’t seem particular to that list so it’s worthy of a blog post. Also people tend to think more about what they do when their actions will be exposed to a wider audience so hopefully people who read this post will think before they respond.


The first discussion concerned the issue of making “jokes”. When dealing with the treatment of other people (particularly minority groups) the issue of “jokes” is a common one. It’s fairly common for people in positions of power to make “jokes” about people with less power and then complain if someone disapproves. The more extreme examples of this concern hate words which are strongly associated with violence, one of the most common is a word used to describe gay men which has often been associated with significant violence and murder. Men who are straight and who conform to the stereotypes of straight men don’t have much to fear from that word while men who aren’t straight will associate it with a death threat and tend not to find any amusement in it.

Most minority groups have words that are known to be associated with hate crimes. When such words are used they usually send a signal that the minority groups in question aren’t welcome. The exception is when the words are used by other members of the group in question. For example if I was walking past a biker bar and heard someone call out “geek” or “nerd” I would be a little nervous (even though geeks/nerds have faced much less violence than most minority groups). But at a Linux conference my reaction would be very different. As a general rule you shouldn’t use any word that has a history of being used to attack any minority group other than one that you are a member of, so black rappers get to use a word that was historically used by white slave-owners but because I’m white I don’t get to sing along to their music. As an aside we had a discussion about such rap lyrics on the Linux Users of Victoria mailing list some time ago, hopefully most people think I’m stating the obvious here but some people need a clear explanation.

One thing that people should consider “jokes” is the issue of punching-down vs punching-up [1] (there are many posts about this topic, I linked to the first Google hit which seems quite good). The basic concept is that making jokes about more powerful people or organisations is brave while making “jokes” about less powerful people is cowardly and serves to continue the exclusion of marginalised people. When I raised this issue in the mailing list discussion a group of men immediately complained that they might be bullied by lots of less powerful people making jokes about them. One problem here is that powerful people tend to be very thin skinned due to the fact that people are usually nice to them. While the imaginary scenario of less powerful people making jokes about rich white men might be unpleasant if it happened in person, it wouldn’t compare to the experience of less powerful people who are the target of repeated “jokes” in addition to all manner of other bad treatment. Another problem is that the impact of a joke depends on the power of the person who makes it, EG if your boss makes a “joke” about you then you have to work on your CV, if a colleague or subordinate makes a joke then you can often ignore it.

Who does a Code of Conduct Protect

One member of the mailing list wrote a long and very earnest message about his belief that the CoC was designed to protect him from off-topic discussions. He analysed the results of a CoC on that basis and determined that it had failed due to the number of off-topic messages on the mailing lists he subscribes to. Being so self-centered is strongly correlated with being in a position of power, he seems to sincerely believe that everything should be about him, that he is entitled to all manner of protection and that any rule which doesn’t protect him is worthless.

I believe that the purpose of all laws and regulations should be to protect those who are less powerful, the more powerful people can usually protect themselves. The benefit that powerful people receive from being part of a system that is based on rules is that organisations (clubs, societies, companies, governments, etc) can become larger and achieve greater things if people can trust in the system. When minority groups are discouraged from contributing and when people need to be concerned about protecting themselves from attack the scope of an organisation is reduced. When there is a certain minimum standard of treatment that people can expect then they will be more willing to contribute and more able to concentrate on their contributions when they don’t expect to be attacked.

The Public Interest

When an organisation declares itself to be acting in the public interest (EG by including “Public Interest” in the name of the organisation) I think that we should expect even better treatment of minority groups. One might argue that a corporation should protect members of minority groups for the sole purpose of making more money (it has been proven that more diverse groups produce better quality work). But an organisation that’s in the “Public Interest” should be expected to go way beyond that and protect members of minority groups as a matter of principle.

When an organisation is declared to be operating in the “Public Interest” I believe that anyone who’s so unable to control their bigotry that they can’t refrain from being bigoted on the mailing lists should not be a member.

August 18, 2015

Using Virtual Machines to Improve Container Security with rkt v0.8.0

Today we are releasing rkt v0.8.0. rkt is an application container runtime built to be efficient, secure and composable for production environments.

This release includes new security features, including initial support for user namespaces and enhanced container isolation using hardware virtualization. We have also introduced a number of improvements such as host journal integration, container socket activation, improved image caching, and speed enhancements.

Intel Contributes rkt stage1 with Virtualization

Intel and rkt

The modular design of rkt enables different execution engines and containerization systems to be built and plugged in. This is achieved using a staged architecture, where the second stage ("stage1") is responsible for creating and launching the container. When we launched rkt, it featured a single, default stage1 which leverages Linux cgroups and namespaces (a combination commonly referred to as "Linux containers").

With the help of engineers at Intel, we have added a new rkt stage1 runtime that utilizes virtualization technology. This means an application running under rkt using this new stage1 can be isolated from the host kernel using the same hardware features that are used in hypervisors like Linux KVM.

In May, Intel announced a proof-of-concept of this feature built on top of rkt, as part of their Intel® Clear Containers effort to utilize hardware-embedded virtualization technology features to better secure container runtimes and isolate applications. We were excited to see this work taking place and being prototyped on top of rkt as it validated some of the early design choices we made, such as the concepts of runtime stages and pods. Here is what Arjan van de Ven from Intel's Open Source Technology Center had to say:

"Thanks to rkt's stage-based architecture, the Intel®Clear Containers team was able to rapidly integrate our work to bring the enhanced security of Intel® Virtualization Technology (Intel® VT-x) to the container ecosystem. We are excited to continue working with the rkt community to realize our vision of how we can enhance container security with hardware-embedded technology, while delivering the deployment benefits of containerized apps.”

Since the prototype announcement in May we have worked closely with the team from Intel to ensure that features such as one IP-per-pod networking and volumes work in a similar way when using virtualization. Today's release of rkt sees this functionality fully integrated to make the lkvm backend a first-class stage1 experience. So, let's try it out!

In this example, we will first run a pod using the default cgroups/namespace-based stage1. Let's launch the container with systemd-run, which will construct a unit file on the fly and start it. Checking the status of this unit will show us what’s going on under the hood.

$ sudo systemd-run --uid=0 \
   ./rkt run \
   --private-net --port=client:2379 \
   --volume data-dir,kind=host,source=/tmp/etcd \,version=v2.2.0-alpha.0 \ 
   -- --advertise-client-urls="" \  
Running as unit run-1377.service.

$ systemctl status run-1377.service
● run-1377.service 
   CGroup: /system.slice/run-1377.service
           ├─1378 stage1/rootfs/usr/bin/systemd-nspawn
           ├─1425 /usr/lib/systemd/systemd 
             │ └─1430 /etcd
               └─1426 /usr/lib/systemd/systemd-journald

Notice that we can see the complete process hierarchy inside the pod, including a systemd instance and the etcd process.

Next, let's launch the same container under the new KVM-based stage1 by adding the --stage1-image flag:

$ sudo systemd-run -t --uid=0 \
  ./rkt run --stage1-image=sha512-c5b3b60ed4493fd77222afcb860543b9 \
  --private-net --port=client:2379 \
  --volume data-dir,kind=host,source=/tmp/etcd2 \,version=v2.2.0-alpha.0 \
  -- --advertise-client-urls="" \

$ systemctl status run-1505.service
● run-1505.service
   CGroup: /system.slice/run-1505.service
           └─1506 ./stage1/rootfs/lkvm

Notice that the process hierarchy now ends at lkvm. This is because the entire pod is being executed inside a KVM process, including the systemd process and the etcd process: to the host system, it simply looks like a single virtual machine process. By adding a single flag to our container invocation, we have taken advantage of the same KVM technologies used by public clouds to isolate tenants to isolate our application container from the host, adding another layer of security to the host.

Thank you to Piotr Skamruk, Paweł Pałucki, Dimitri John Ledkov, Arjan van de Ven from Intel for their support and contributions. For more details on this feature see the lkvm stage1 guide.

Seamless Integration With Host Level-Logging

On systemd hosts, the journal is the default log aggregation system. With the v0.8.0 release, rkt now automatically integrates with the host journal, if detected, to provide a systemd native log management experience. To explore the logs of a rkt pod, all you need to do is add a machine specifier like -M rkt-$UUID to a journalctl command on the host.

As a simple example, let's explore the logs of the etcd container we launched earlier. First we use machinectl to list the pods that rkt has registered with systemd:

$ machinectl list
MACHINE                                  CLASS     SERVICE
rkt-bccc16ea-3e63-4a1f-80aa-4358777ce473 container nspawn
rkt-c3a7fabc-9eb8-4e06-be1d-21d57cdaf682 container nspawn

2 machines listed.

We can see our etcd pod listed as the second machine known by systemd. Now we use the journal to directly access the logs of the pod:

$ sudo journalctl -M rkt-c3a7fabc-9eb8-4e06-be1d-21d57cdaf682
etcd[4]: 2015-08-18 07:04:24.362297 N | etcdserver: set the initial cluster version to 2.2.0

User Namespace Support

This release includes initial support for user namespaces to improve container isolation. By leveraging user namespaces, an application may run as the root user inside of the container but will be mapped to a non-root user outside of the container. This adds an extra layer of security by isolating containers from the real root user on the host. This early preview of the feature is experimental and uses privileged user namespaces, but future versions of rkt will improve on the foundation found in this release and offer more granular control.

To turn user namespaces on, two flags need to be added to our original example: --private-users and --no-overlay. The first turns on the user namespace feature and the second disables rkt's overlayfs subsystem, as it is not currently compatible with user namespaces:

$ ./rkt run --no-overlay --private-users \
  --private-net --port=client:2379 \
  --volume data-dir,kind=host,source=/tmp/etcd \,version=v2.2.0-alpha.0 \
  -- --advertise-client-urls="" \

We can confirm this is working by using curl to verify etcd's functionality and then checking the permissions on the etcd data directory, noting that from the host's perspective the etcd member directory is owned by a very high user id:

$ curl

$ ls -la /tmp/etcd
total 0
drwxrwxrwx  3 core       core        60 Aug 18 07:31 .
drwxrwxrwt 10 root       root       200 Aug 18 07:31 ..
drwx------  4 1037893632 1037893632  80 Aug 18 07:31 member

Adding user namespaces support is an important step towards our goal of making rkt the most secure container runtime, and we will be working hard to improve this feature in coming releases - you can see the roadmap in this issue.

Open Containers Initiative Progress

With rkt v0.8.0 we are furthering our efforts with security hardening and moving closer to a 1.0 stable and production-ready release. We are also dedicated to ensuring that the container ecosystem continues down a path that enables people publishing containers to “build once, sign once, and run anywhere.” Today rkt is an implementation of the App Container spec (appc), and in the future we hope to make rkt an implementation of the Open Container Initiative (OCI) specification. However, the OCI effort is still in its infancy and there is a lot of work left to do. To check on the progress of the effort to harmonize OCI and appc, you can read more about it on the OCI dev mailing list.

Contribute to rkt

One of the goals of rkt is to make it the most secure container runtime, and there is a lot of exciting work to be done as we move closer to 1.0. Join us on our mission: we welcome your involvement in the development of rkt, via discussion on the rkt-dev mailing list, filing GitHub issues, or contributing directly to the project.

BTRFS Training

Some years ago Barwon South Water gave LUV 3 old 1RU Sun servers for any use related to free software. We gave one of those servers to the Canberra makerlab and another is used as the server for the LUV mailing lists and web site and the 3rd server was put aside for training. The servers have hot-swap 15,000rpm SAS disks – IE disks that have a replacement cost greater than the budget we have for hardware. As we were given a spare 70G disk (and a 140G disk can replace a 70G disk) the LUV server has 2*70G disks and the 140G disks (which can’t be replaced) are in the server for training.

On Saturday I ran a BTRFS and ZFS training session for the LUV Beginners’ SIG. This was inspired by the amount of discussion of those filesystems on the mailing list and the amount of interest when we have lectures on those topics.

The training went well, the meeting was better attended than most Beginners’ SIG meetings and the people who attended it seemed to enjoy it. One thing that I will do better in future is clearly documenting commands that are expected to fail and documenting how to login to the system. The users all logged in to accounts on a Xen server and then ssh’d to root at their DomU. I think that it would have saved a bit of time if I had aliased commands like “btrfs” to “echo you must login to your virtual server first” or made the shell prompt at the Dom0 include instructions to login to the DomU.

Each user or group had a virtual machine. The server has 32G of RAM and I ran 14 virtual servers that each had 2G of RAM. In retrospect I should have configured fewer servers and asked people to work in groups, that would allow more RAM for each virtual server and also more RAM for the Dom0. The Dom0 was running a BTRFS RAID-1 filesystem and each virtual machine had a snapshot of the block devices from my master image for the training. Performance was quite good initially as the OS image was shared and fit into cache. But when many users were corrupting and scrubbing filesystems performance became very poor. The disks performed well (sustaining over 100 writes per second) but that’s not much when shared between 14 active users.

The ZFS part of the tutorial was based on RAID-Z (I didn’t use RAID-5/6 in BTRFS because it’s not ready to use and didn’t use RAID-1 in ZFS because most people want RAID-Z). Each user had 5*4G virtual disks (2 for the OS and 3 for BTRFS and ZFS testing). By the end of the training session there was about 76G of storage used in the filesystem (including the space used by the OS for the Dom0), so each user had something like 5G of unique data.

We are now considering what other training we can run on that server. I’m thinking of running training on DNS and email. Suggestions for other topics would be appreciated. For training that’s not disk intensive we could run many more than 14 virtual machines, 60 or more should be possible.

Below are the notes from the BTRFS part of the training, anyone could do this on their own if they substitute 2 empty partitions for /dev/xvdd and /dev/xvde. On a Debian/Jessie system all that you need to do to get ready for this is to install the btrfs-tools package. Note that this does have some risk if you make a typo. An advantage of doing this sort of thing in a virtual machine is that there’s no possibility of breaking things that matter.

  1. Making the filesystem
    1. Make the filesystem, this makes a filesystem that spans 2 devices (note you must use the-f option if there was already a filesystem on those devices):

      mkfs.btrfs /dev/xvdd /dev/xvde
    2. Use file(1) to see basic data from the superblocks:

      file -s /dev/xvdd /dev/xvde
    3. Mount the filesystem (can mount either block device, the kernel knows they belong together):

      mount /dev/xvdd /mnt/tmp
    4. See a BTRFS df of the filesystem, shows what type of RAID is used:

      btrfs filesystem df /mnt/tmp
    5. See more information about FS device use:

      btrfs filesystem show /mnt/tmp
    6. Balance the filesystem to change it to RAID-1 and verify the change, note that some parts of the filesystem were single and RAID-0 before this change):

      btrfs balance start -dconvert=raid1 -mconvert=raid1 -sconvert=raid1 –force /mnt/tmp

      btrfs filesystem df /mnt/tmp
    7. See if there are any errors, shouldn’t be any (yet):

      btrfs device stats /mnt/tmp
    8. Copy some files to the filesystem:

      cp -r /usr /mnt/tmp
    9. Check the filesystem for basic consistency (only checks checksums):

      btrfs scrub start -B -d /mnt/tmp
  2. Online corruption
    1. Corrupt the filesystem:

      dd if=/dev/zero of=/dev/xvdd bs=1024k count=2000 seek=50
    2. Scrub again, should give a warning about errors:

      btrfs scrub start -B /mnt/tmp
    3. Check error count:

      btrfs device stats /mnt/tmp
    4. Corrupt it again:

      dd if=/dev/zero of=/dev/xvdd bs=1024k count=2000 seek=50
    5. Unmount it:

      umount /mnt/tmp
    6. In another terminal follow the kernel log:

      tail -f /var/log/kern.log
    7. Mount it again and observe it correcting errors on mount:

      mount /dev/xvdd /mnt/tmp
    8. Run a diff, observe kernel error messages and observe that diff reports no file differences:

      diff -ru /usr /mnt/tmp/usr/
    9. Run another scrub, this will probably correct some errors which weren’t discovered by diff:

      btrfs scrub start -B -d /mnt/tmp
  3. Offline corruption
    1. Umount the filesystem, corrupt the start, then try mounting it again which will fail because the superblocks were wiped:

      umount /mnt/tmp

      dd if=/dev/zero of=/dev/xvdd bs=1024k count=200

      mount /dev/xvdd /mnt/tmp

      mount /dev/xvde /mnt/tmp
    2. Note that the filesystem was not mountable due to a lack of a superblock. It might be possible to recover from this but that’s more advanced so we will restore the RAID.

      Mount the filesystem in a degraded RAID mode, this allows full operation.

      mount /dev/xvde /mnt/tmp -o degraded
    3. Add /dev/xvdd back to the RAID:

      btrfs device add /dev/xvdd /mnt/tmp
    4. Show the filesystem devices, observe that xvdd is listed twice, the missing device and the one that was just added:

      btrfs filesystem show /mnt/tmp
    5. Remove the missing device and observe the change:

      btrfs device delete missing /mnt/tmp

      btrfs filesystem show /mnt/tmp
    6. Balance the filesystem, not sure this is necessary but it’s good practice to do it when in doubt:

      btrfs balance start /mnt/tmp
    7. Umount and mount it, note that the degraded option is not needed:

      umount /mnt/tmp

      mount /dev/xvdd /mnt/tmp
  4. Experiment
    1. Experiment with the “btrfs subvolume create” and “btrfs subvolume delete” commands (which act like mkdir and rmdir).
    2. Experiment with “btrfs subvolume snapshot SOURCE DEST” and “btrfs subvolume snapshot -r SOURCE DEST” for creating regular and read-only snapshots of other subvolumes (including the root).

August 15, 2015

August 14, 2015

Introducing the Kubernetes kubelet in CoreOS Linux

This week we have added the kubelet, a central building block of Kubernetes, in the alpha channel for CoreOS Linux. The kubelet is responsible for maintaining a set of pods, which are composed of one or more containers, on a local system. Within a Kubernetes cluster, the kubelet functions as a local agent that watches for pod specs via the Kubernetes API server. The kubelet is also responsible for registering a node with a Kubernetes cluster, sending events and pod status, and reporting resource utilization.

While the kubelet plays an important role in a Kubernetes cluster, it also works well in standalone mode — outside of a Kubernetes cluster. The rest of this post will highlight some of the useful things you can do with the kubelet running in standalone mode such as running a single node Kubernetes cluster and monitoring container resource utilization with the built-in support for cAdvisor.

First we need to get the kubelet up and running. Be sure to follow this tutorial using CoreOS Linux 773.1.0 or greater.

Configuring the Kubelet with systemd

CoreOS Linux ships with reasonable defaults for the kubelet, which have been optimized for security and ease of use. However, we are going to loosen the security restrictions in order to enable support for privileged containers. This is required to run the proxy component in a single node Kubernetes cluster, which needs access to manipulate iptables to facilitate the Kubernetes service discovery model.

Create the kubelet systemd unit:

sudo vim /etc/systemd/system/kubelet.service
Description=Kubernetes Kubelet

ExecStartPre=/usr/bin/mkdir -p /etc/kubernetes/manifests
ExecStart=/usr/bin/kubelet \
  --api-servers= \
  --allow-privileged=true \
  --config=/etc/kubernetes/manifests \


Start the kubelet service

With the systemd unit file in place start the kubelet using the systemctl command:

sudo systemctl daemon-reload
sudo systemctl start kubelet

To ensure the kubelet restarts after a reboot be sure to enable the service:

sudo systemctl enable kubelet

At this point you should have a running kubelet service. You can verify this using the systemctl status command:

sudo systemctl status kubelet

Bootstrapping a single node Kubernetes cluster

The kubelet provides a convenient interface for managing containers on a local system. The kubelet supports a manifest directory, which is monitored for pod manifest every 20 seconds by default. This directory /etc/kubernetes/manifests was configured earlier via the --config flag in the kubelet systemd unit.

Pod manifests are written in the JSON or YAML file formats and describe a set of volumes and one or more containers. We can deploy a single node Kubernetes cluster using a pod manifest placed in the manifest directory.

Download the Kubernetes pod manifest


Downloading a pod manifest over the Internet is a potential security risk, so be sure to review the contents of any pod manifest before running them on your system.

cat kubernetes.yaml

At this point we only need to copy the kubernetes.yaml pod manifest to the kubelet’s manifest directory in order to bootstrap a single node cluster.

sudo cp kubernetes.yaml /etc/kubernetes/manifests/

After the copy completes you can view the Docker images and containers being started with the standard Docker command line tools:

sudo docker images
sudo docker ps

After a few minutes you should have a running Kubernetes cluster. Next download the official Kubernetes client tool.

Download the Kubernetes client

kubectl is the official command line tool for interacting with a Kubernetes cluster. Each release of Kubernetes contains a new kublet version. Download it and make it executable:

chmod +x kubectl

kubectl can be used to get information about a running cluster:

./kubectl cluster-info
Kubernetes master is running at http://localhost:8080

kubectl can also be used to launch pods:

./kubectl run nginx --image=nginx

View the running pods using the get pods command:

./kubectl get pods

To learn more about Kubernetes check out the Kubernetes on CoreOS docs.

Monitoring Containers with cAdvisor

The kubelet ships with built-in support for cAdvisor, which collects, aggregates, processes and exports information about running containers on a given system. cAdvisor includes a built-in web interface available on port 4194.


The cadvisor web interface.

The cAdvisor web UI provides a convenient way to view system wide resource utilization and process listings.

cadvisor gauges

System utilization information.

cAdvisor can also be used to monitor a specific container such as the kube-apiserver running in the Kubernetes pod:

cadvisor inspecting a container

Inspecting a container with cadvisor.

To learn more about cAdvisor check out the upstream docs.

More with CoreOS and Kubernetes

Adding the kubelet to the CoreOS Linux image demonstrates our commitment to Kubernetes and bringing the best of open source container technology to our users. With native support for the Kubernetes kubelet we hope to streamline Kubernetes deployments, and provide a robust interface for managing and monitoring containers on a CoreOS system.

If you’re interested in learning more about Kubernetes, be sure to attend one of our upcoming trainings on Kubernetes in your area. More dates will be added so keep checking back. If you want to request private on-site training, contact us.

August 12, 2015

Downgrade Quagga on Debian 8

The Quagga version in Debian 8 (v0.99.23.1) suffers from a bug in ospf6d, which causes that no IPv6 routes are exchanged via point-to-point interfaces.

In order to workaround this problem (and re-establish IPv6 connectivity), a downgrade of the quagga package can be done.

For this we add the 'oldstable' entry to sources.list and pin the quagga package to the old version.

Entry to add to /etc/apt/sources.list:

deb oldstable main

Entry to add to /etc/apt/preferences:

Package: quagga
Pin: version 0.99.22.*
Pin-Priority: 1001

After the entries have been added, run apt-get update followed by apt-get install quagga to downgrade to the old quagga package.

August 04, 2015

Meet the CoreOS team around the world in August

This month the CoreOS team will be speaking from locations along the Pacific Northwest in the US, to Austria, to Japan and China. August also begins our Kubernetes workshop series, brought to you by Google and Intel.

Wednesday, August 5, 2015 at 10:00 a.m. PST – Portland, OR

We kick off our Kubernetes training series in Portland with Kelsey Hightower (@kelseyhightower), product manager, developer and chief advocate at CoreOS. This hands-on workshop will teach you everything you need to know about Kubernetes, CoreOS and Docker. We are offering the workshop for only the cost of materials ($75) for a limited time so we encourage you to send any members of your team for this date. Register in advance to attend.

Friday, August 7, 2015 at 10:00 a.m. PST – Seattle, WA

Kelsey will provide the next Kubernetes training in Seattle, guiding your team through Kubernetes, CoreOS and Docker for only the cost of materials ($75) for a limited time. This event is sold out but we have several other trainings in other cities.

Friday, August 7, 2015 at 4:00 p.m. PST – Las Vegas, NV

Going to DEF CON 23 this year? Meet Brian “Redbeard” Harrington (@brianredbeard), who will speak in a Skytalk on container security and kernel namespaces on Friday, August 7.

In case you missed it, see Redbeard’s presentation on minimal containers from the CoreOS + Sysdig San Francisco July meetup.

Monday, August 10, 2015 at 10:00 a.m. PST – San Francisco, CA

Join us for a daylong Kubernetes training in San Francisco. Kelsey will walk you through Kubernetes, CoreOS and Docker. Seats are filling up quickly so register early to secure your spot.

Tuesday, August 11, 2015 at 6:30 p.m. BST – London, UK

Join Iago López (@iaguis), senior developer, for the Container Infrastructure Meetup at uSwitch in London. He’ll provide an overview and update on rkt, a container runtime designed to be composable, secure and fast.

Monday, August 17, 2015 at 2:20 p.m. PST – Seattle, WA

CoreOS will be at LinuxCon and ContainerCon for the week! Join us for a variety of talks in Seattle.

Wednesday, August 19, 2015 at 7:00 p.m. JST – Tokyo, Japan

Save the date! Kelsey Hightower will be speaking at a meetup in Tokyo. More details will be added – stay tuned on our community page for updates.

Wednesday, August 19, 2015 at 10:25 a.m. PST – Seattle, WA

More CoreOS talks at LinuxCon and ContainerCon include two speakers with expertise in security and networking.

Thursday, August 20, 2015 at 9:30 a.m. PST – Seattle, WA

From LinuxCon and ContainerCon, our team will also be speaking at Linux Plumber’s Conference in Seattle. Brandon Philips will kick off the event with a talk on Open Containers.

Friday, August 21, 2015 at 11:10 a.m. JST – Tokyo, Japan

Meet Kelsey Hightower in Tokyo at YAPC Asia. He’ll discuss managing containers at scale with CoreOS and Kubernetes.

Friday, August 21, 2015 at 9:00 a.m. PST – Seattle, WA

Linux Plumbers Conference attendees are welcome to join Matthew Garrett to learn about securing the entire boot chain.

MesosCon attendees should not miss Brandon Philips discussing rkt and more at 11:30 a.m. PT.

Tuesday, August 25, 2015 – Vienna, Austria

Jonathan Boulle will be giving a keynote at Virtualization in High-Performance Cloud Computing (VHPC ’15), held in conjunction with Euro-Par 2015, in Vienna, Austria. Jon will discuss the work behind designing an open standard for running applications in containers.

Wednesday, August 26, 2015 at 11:35 a.m. PST – Mountain View, CA

OpenStack Silicon Valley, hosted at the Computer History Museum, will feature Alex Polvi (@polvi), CEO of CoreOS. He’ll present Containers for the Enterprise: It's Not That Simple on August 26 at 11:35 a.m. PT.

Immediately following is a deep-dive session with Wall Street Journal technology reporter Shira Ovide (@ShiraOvide), joined by Alex, James Staten, chief strategist of the cloud and enterprise division at Microsoft, as well as Craig McLuckie (@cmcluck), senior product manager at Google. They will discuss practical choices facing enterprises moving to an IT resource equipped to support software developers in their work to help their companies compete.

Friday, August 28, 2015 – Beijing, China

At CNUT Con, presented by InfoQ in Beijing, Kelsey Hightower will give a keynote: From Theory to Production: Managing Applications at Scale.

To invite CoreOS to a meetup, training or conference in your area email us or tweet to us @CoreOSLinux!

July 24, 2015

Introducing etcd 2.1

After months of focused work, etcd 2.1 has been released. Since the etcd 2.0 release in January, the team has gathered a ton of valuable feedback from real-world environments. And based on that feedback, this release introduces: authentication/authorization APIs, new metric endpoints, improved transportation stability, increased performance between etcd servers, and enhanced cluster stability.

For a quick overview, etcd is an open source, distributed, consistent key value store for shared configuration, service discovery, and scheduler coordination. By using etcd, applications can ensure that even in the face of individual servers failing, the application will continue to work. etcd is a core component of CoreOS software that facilitates safe automatic updates, coordinating work being scheduled to hosts, and setting up overlay networking for containers.

If you want to skip the talk and get right to the code, you can find new binaries on GitHub. etcd 2.1.1 is available in CoreOS 752.1.0 (currently in the alpha channel), so feel free to take it for a spin.

Zero-Downtime Rolling Upgrade from 2.0

Upgrading from etcd 2.0 to etcd 2.1 is a zero-downtime rolling upgrade. The basic approach is that you can upgrade a cluster running etcd 2.0 one-by one to etcd 2.1. For more details, please read the upgrade documentation. If you are running your cluster under etcd 0.4.x, please upgrade to etcd 2.0 first and then follow the rolling upgrade steps.

Also, with this release, etcd 2.1 is now the current stable etcd release; as such, all bug fixes will go into new etcd 2.1.x releases and won't be backported to etcd 2.0.x.

Auth API for Authentication and Authorization

A major feature in this release is the /v2/auth endpoint, which adds auth to the etcd key/value API. This API lets you manage authorization of key prefixes with users and roles and authenticate those users using HTTP basic authentication, enabling users to have more control within teams. This includes support in the etcd HTTP server, the command-line etcdctl client, and the Go etcd/client package. You can find full details in the authentication documentation. Please note that this is an experimental feature and will be improved based on user feedback. We think we got the details right but may adjust the API in a subsequent release.

Improved Transport Stability

Many users of etcd have networks with inconsistent performance and latency. We can't make etcd work perfectly in all of these difficult environments but what we have done in this release is optimize the way etcd uses the network in a variety of ways to make it perform in an optimal manner.

First, to reduce the connection creation overhead and to make the consensus protocol (raft) communication more efficient and stable, etcd now maintains long running connections with other peers. Next, to reduce the raft command commit latency, each raft append message is now attached to a commit index. The commit latency is reduced from 100ms to 1ms under light load (<100 writes/second). And finally, etcd's raft implementation now provides better internal flow control, significantly reducing the possibility of raft message loss, and improving CPU and memory efficiency.

Functional Testing

For four months we have been running etcd against a fault-injecting and functional testing framework we built. Our goal is to ensure etcd is failure-resistant while under heavy usage; and in these months of testing, etcd has shown to be robust under many kinds of harsh failure scenarios. We will continue to run these tests as we iterate on the 2.1 releases.

Improved Logging

Leveled logging is supported now. Users can set an expected log level for etcd and its subpackages. In the meantime, we have moved verbose, repeated logging to DEBUG log level, so etcd's default log will be significantly more readable. You can control leveled logging using flags listed here.

New Metrics API

etcd 2.1 introduces a new metrics API endpoint that can be used for real-time monitoring and debugging. It exposes statistics about both client behaviors and resource usage. Like the auth API endpoint, this is an experimental feature which may be improved and changed based on user feedback.

Get Involved and Get Started

We will continue to work to make etcd a fundamental building block for Google-like infrastructure that users can take off the shelf, build upon, and rely on. Get started with etcd, continue to share your feedback, and even help by contributing directly to the code.

July 21, 2015

CoreOS and Kubernetes 1.0

Today is a big day for Kubernetes, as it hits its 1.0 release milestone. Kubernetes provides a solid foundation for running container-based infrastructure providing API driven deployment, service discovery, monitoring and load balancing. It is exciting progress towards bringing industry-wide Google-like infrastructure for everyone else (GIFEE) through community-built open source software.

Kubernetes 1.0 on CoreOS Open Source Guides

The Kubernetes project has come a long way in just over a year, with many API improvements and more recently a focus on stability and scalability. If you haven't tried Kubernetes recently it is a worthwhile experience and can get you thinking how containers can be more easily used in real-world deployments: whether it is doing your first rolling upgrade of your containerized app or using DNS service discovery between components.

For those that want to try Kubernetes 1.0 on CoreOS, we have put together some easy-to-read open source guides to run Kubernetes 1.0 on CoreOS. And as always if you need help try us on the #coreos irc channel or coreos-user mailing list.

CoreOS Joins Cloud Native Foundation

When we started building CoreOS Linux two years ago we wanted to encourage people to run infrastructure in a secure, distributed and consistent manner. This required many components along the way, including new datastores like etcd, container runtimes like Docker & rkt, and cluster wide application deployment, orchestration, and service discovery like Kubernetes. Today, CoreOS is joining a new foundation along with Google, Twitter, Huawei and other industry partners to collaborate and build the technologies that are changing how people are thinking about infrastructure software. This new foundation, the Cloud Native Foundation, is being launched in partnership with the Linux Foundation and will shepherd the industry collaboration around Kubernetes and other projects moving forward.

Tectonic Preview

For companies who want help building their infrastructure in this this manner we are also announcing that Tectonic is now in Preview, this includes: 24x7 support, a friendly web-based console, and deployment guides for AWS and on your own hardware. We invite you to read more about Tectonic Preview on our Tectonic blog.

Kubernetes Training

Also today, we are launching Kubernetes Training. The first workshops will be delivered by Kelsey Hightower, product manager, developer and chief advocate at CoreOS, and will take place on August 5 in Portland, August 7 in Seattle and August 10 in San Francisco.

By joining these workshops, you will learn more about Kubernetes, CoreOS, Docker and rkt and leave knowing Kubernetes Core Concepts, how to enable and manage key cluster add-ons such as DNS, monitoring, and the UI, how to configure nodes for the Kubernetes networking model and how to manage applications with Kubernetes deployment patterns.

For a limited time, the workshops will be available at a special rate for only the cost materials. Sign-up for a workshop in your area early they will fill-up fast.


The CoreOS team is at OSCON this week and you have three ways to find us:

CoreOS and Kubernetes 1.0

Today is a big day for Kubernetes, as it hits its 1.0 release milestone. Kubernetes provides a solid foundation for running container-based infrastructure providing API driven deployment, service discovery, monitoring and load balancing. It is exciting progress towards bringing industry-wide Google-like infrastructure for everyone else (GIFEE) through community-built open source software.

Kubernetes 1.0 on CoreOS Open Source Guides

The Kubernetes project has come a long way in just over a year, with many API improvements and more recently a focus on stability and scalability. If you haven't tried Kubernetes recently it is a worthwhile experience and can get you thinking how containers can be more easily used in real-world deployments: whether it is doing your first rolling upgrade of your containerized app or using DNS service discovery between components.

For those that want to try Kubernetes 1.0 on CoreOS, we have put together some easy-to-read open source guides to run Kubernetes 1.0 on CoreOS. And as always if you need help try up on the #coreos irc channel or coreos-user mailing list.

CoreOS Joins Cloud Native Foundation

When we started building CoreOS Linux two years ago we wanted to encourage people to run infrastructure in a secure, distributed and consistent manner. This required many components along the way, including new datastores like etcd, container runtimes like Docker & rkt, and cluster wide application deployment, orchestration, and service discovery like Kubernetes. Today, CoreOS is joining a new foundation along with Google, Twitter, Huawei and other industry partners to collaborate and build the technologies that are changing how people are thinking about infrastructure software. This new foundation, the Cloud Native Foundation, is being launched in partnership with the Linux Foundation and will shepherd the industry collaboration around Kubernetes and other projects moving forward.

Tectonic Preview

For companies who want help building their infrastructure in this this manner we are also announcing that Tectonic is now in Preview, this includes: 24x7 support, a friendly web-based console, and deployment guides for AWS and on your own hardware. We invite you to read more about Tectonic Preview on our Tectonic blog.

Kubernetes Training

Also today, we are launching Kubernetes Training. The first workshops will be delivered by Kelsey Hightower, product manager, developer and chief advocate at CoreOS, and will take place on August 5 in Portland, August 7 in Seattle and August 10 in San Francisco.

By joining these workshops, you will learn more about Kubernetes, CoreOS, Docker and rkt and leave knowing Kubernetes Core Concepts, how to enable and manage key cluster add-ons such as DNS, monitoring, and the UI, how to configure nodes for the Kubernetes networking model and how to manage applications with Kubernetes deployment patterns.

For a limited time, the workshops will be available at a special rate for only the cost materials. Sign-up for a workshop in your area early they will fill-up fast.


The CoreOS team is at OSCON this week and you have three ways to find us:

July 20, 2015

July 17, 2015

Meet CoreOS at OSCON and more

Next week we are heading to Portland, Oregon for OSCON. We look forward to meeting fellow OSCON attendees and Portland friends at one of the below events, or at our booth (# 900) on the OSCON show floor, July 21-24. If you have questions about Kubernetes, CoreOS, Docker or rkt sign up for office hours at our booth and get one-on-one time with our team. Read on to see more about where we will be next week. See you then!

Sunday, July 19

Get revved up for OSCON and see Kelsey Hightower, product manager, developer and chief advocate at CoreOS, speak in a lightning talk at the NGINX Summit at 3 p.m. PT. Register here for your ticket.

Tuesday, July 21

CoreOS will be at the Kubernetes 1.0 event – be sure to get there in time for the keynote at 11 a.m. PT. Get your ticket before it sells out! If you can’t make it in person you can register for the live-stream. We’ll be there throughout the day and if you miss us at the event, connect with our team at the Kubernetes After Hours Party on Tuesday too.

At OSCON, Kelsey Hightower will deliver a much-requested 3.5-hour tutorial starting at 1:30 p.m. PT on taming microservices with CoreOS and Kubernetes.

The OSCON Expo Hours begin at 5 p.m. so meet us at our booth if you’re there early for the reception.

Wednesday, July 22 - Thursday, July 24

Our CoreOS booth will have expert engineers to answer your questions and get you started with Tectonic. Sign up for office hours and talk with a CoreOS expert to get all your Kubernetes, CoreOS, Docker and rkt questions answered. Visit us at booth 900 all day on Wednesday and Thursday and tweet to us @CoreOSLinux or @TectonicStack.

Wednesday, July 22

Join us for our second annual CoreOS Portland OSCON meetup starting at 6 p.m. PT at the Ecotrust Building. Brian “Redbeard” Harrington, principal architect at CoreOS, Brandon Philips, CTO of CoreOS, Kelsey Hightower, product manager at CoreOS, and Matthew Garrett, principal security engineer at CoreOS, will lead the talks of the evening. We thank our sponsors, Redapt and Couchbase for making the event possible and providing drinks and bites on the Ecotrust rooftop! RSVP here.

Thursday, July 23

After your day at OSCON, join us for a Birds of a Feather (BoF) session at 7 p.m. PT by Brian “Redbeard” Harrington, principal architect at CoreOS. He will have a lively interactive conversation with attendees and cover how to get started with CoreOS, CoreOS components and CoreOS best practices you want to learn about most.

Friday, July 24

At 11:10 a.m. PT Matthew Garrett will present building a trustworthy computer.

See you in Portland!

July 15, 2015

Announcing rkt v0.7.0, featuring a new build system, SELinux and more

Today we are announcing rkt v0.7.0. rkt is an app container runtime built to be efficient, secure and composable for production environments. This release includes new subcommands for a rkt image to manipulate images from the local store, a new build system based on autotools and integration with SELinux. These new capabilities improve the user experience, make it easier to build future features and improve security isolation between containers.

Note on rkt and OCP

As you know, rkt is an implementation of the App Container (appc) spec and rkt is also targeted to be a future implementation of the Open Container Project (OCP) specification. The OCP development is still in its early days. Our plans with rkt are unchanged and the team is committed to the continued development of rkt. This is all a part of the goal to build rkt as a leading container runtime focused on security and composability for the most demanding production requirements.

Now, read on for details on the new features.

New Subcommands for rkt image

In this release all of the subcommands dealing with images in the local store can be found inside rkt image. Apart from the already existing subcommands rkt image list, rkt image rm and rkt image cat-manifest, this release adds three more:

rkt image export

This subcommand exports an ACI from the local store. This comes in handy when you want to copy an image to another machine, file server and so-on.

$ rkt image export etcd.aci
$ tar xvf etcd.aci

Note that this command does not perform any network I/O so the image must be in the local store beforehand. Also, the exported ACI file might be different from the original imported to the store because rkt image export always returns uncompressed ACIs.

rkt image extract

For debugging or inspection you may want to extract an ACI to a directory on disk. You can get the full ACI or just its rootfs:

$ rkt image extract etcd-extracted
$ find etcd-extracted
$ rkt image extract --rootfs-only etcd-rootfs
$ find etcd-rootfs

As with rkt image export no network I/O will be performed.

rkt image render

While the previous command extracts an ACI to a directory, it doesn’t take into account image dependencies or pathWhitelists. To get an image rendered as it would look ready-to-run inside of the rkt stage2 you can run rkt image render:

$ rkt image render --rootfs-only etcd-rendered
$ find etcd-rendered

New Build System

In 0.7.0 we introduce a new build system based on autotools. Previous versions of rkt were built with a combination of shell scripts and ad-hoc Makefiles. As building complexity grew, more and more environment variables were added that made new build options less discoverable and complicated development.

The new build system based on autotools in 0.7.0 has more discoverable options and should make it easier to build future features like cross-compiling or a KVM-based stage1.

This is how you build rkt now:

$ ./ 

Initialized build system. For a common configuration please run:

./configure --with-stage1=coreos
$ ./configure --help
`configure' configures rkt 0.7.0+git to adapt to many kinds of systems.
Optional Features:                                                                                                                                                                                                                                                             
  --disable-option-checking  ignore unrecognized --enable/--with options                                                                                                                                                                                                       
  --disable-FEATURE       do not include FEATURE (same as --enable-FEATURE=no)                                                                                                                                                                                                 
  --enable-FEATURE[=ARG]  include FEATURE [ARG=yes]                                                                                                                                                                                                                            
                          enable functional tests on make check (linux only,                                                                                                                                                                                                   
                          uses sudo, default: no, use auto to enable if                                                                                                                                                                                                        

Optional Packages:                                                                                                                                                                                                                                                             
  --with-PACKAGE[=ARG]    use PACKAGE [ARG=yes]
  --without-PACKAGE       do not use PACKAGE (same as --with-PACKAGE=no)
  --with-stage1=type      type of stage1 build one of 'src', 'coreos', 'host',
                          'none', 'kvm' (default: 'coreos')
                          address to git repository of systemd, used in 'src'
                          build mode (default: '')
                          systemd version to build (default:
                          custom stage1 image path (default:
$ ./configure && make -j4

Note that all the build options are listed with a description text that helps the user know what to write instead of having them read the build scripts to figure out which environmental variables to set.

SELinux Support

We also added support for running containers using SELinux SVirt, improving security isolation between containers. This means every rkt instance will run in a different SELinux context. Processes started in these contexts will be unable to interact with processes or files in any other instance’s context, even though they are running as the same user.

This feature depends on appropriate policy being provided by the underlying Linux distribution. If supported, a file called “lxc_contexts” will be present in the SELinux contexts directory under /etc/selinux. In the absence of appropriate support, SELinux SVirt will automatically be disabled at runtime.

Other Features

  • rkt registers pods with the metadata service by default now. Ensure it is running before running pods (rkt metadata-service) or disable registration with rkt run --mds-register=false.
  • We started improving rkt UX by reducing stage1 verbosity and writing better and more consistent error messages. As we look towards the next rkt releases, we will be focusing more on UX improvements.

Get Involved

Be a part of the development of rkt or join the discussion through the rkt-dev mailing list or GitHub issues. We welcome you to contribute directly to the project.

July 14, 2015

Q&amp;A with Sysdig on containers, monitoring and CoreOS

Today we congratulate Sysdig, the container visibility company, on its funding news and launch of its commercial offering, Sysdig Cloud. We interviewed Loris Degioanni, the creator and CEO of Sysdig, about the company, containers and how Sysdig works with CoreOS. He is a pioneer in the field of network analysis through his work on WinPcap and Wireshark, which are open source tools with millions of users worldwide.

Read on to dive in, and be sure to meet Sysdig and our team at our July 29 Meetup in San Francisco to learn more.

Q: In your own words, what is Sysdig? Why is it important in containerized environments?

Loris: Sysdig is an open source system visibility tool, designed to meet the needs of modern IT professionals. You can use it to monitor and troubleshoot things like system activity, network and file I/O, application requests and much more. Unique features include the ability to work with trace files (similar to tools such as Wireshark) and deep, native container support.

As for containerized environments: containers are an extremely interesting and powerful technology – I’m personally a big fan. But containers are also a relatively young technology (at least in their current form), and until now there has been a bit of catch 22 in terms of container visibility. Either you monitor your containers from the outside, with inherently limited visibility, given the opaque and self-contained nature of containers. Or you install extra monitoring software inside the container, which largely undermines the benefits of using a container in the first place – performance, deployability, portability, dependency simplification, security, etc.

Sysdig is the first visibility tool designed specifically to support containers. And in order to truly support containers, we knew we had to solve the issue above. Sysdig’s instrumentation is based on a little kernel module that can capture information like system calls from “underneath” containers. This makes it possible to explore anything that’s happening inside containers, while running sysdig entirely on the host machine or inside another container. There is no need to instrument your containers, or install any agent inside them. In other words, Sysdig provides full visibility into all your containers, from outside of the containers themselves.

This tends to be quite a radical departure from what people are used to, and is also the basis of our commercial product, Sysdig Cloud. Based on this same open source technology, Sysdig Cloud offers a container-native monitoring solution, with distributed collection, discovery, dashboarding, alerting, and topology mapping.

Q: What lessons from contributing to Wireshark influence what you are doing today?

Loris: I spent my Ph.D. and the first 10 years of my career working on network monitoring. The lessons I learned during that time have highly influenced the architecture and underlying philosophies of sysdig.

Network monitoring as a whole offers a pretty elegant set of workflows. First, there is the fundamental ability to capture the information you need into trace files. These trace files are not only easily shared, but maybe even more importantly, they decouple the troubleshooting process from the issue itself. No longer are you working inside of a broken system, trying to fix a problem, as the problem is bringing down the system around you. Network monitoring workflows also include the ability to filter information with well known languages, and visualize your data with industry standard tools like Wireshark.

I believe these workflows are not only relevant in the context of network monitoring. Trace files, decoupled troubleshooting, natural filters, standardized visualizations: these are widely applicable concepts. With our work on sysdig, we are trying to bring these well-proven approaches from the world of network monitoring into the world of system, container and application visibility.

Q: How does Sysdig work with CoreOS environments? What types of information can Sysdig pull from a CoreOS host?

Loris: Sysdig fully supports CoreOS environments, and offers the same 100% visibility you would find in a non-containerized environment. Sysdig works with CoreOS by installing the container we provide, which contains all the required dependencies and offers an isolated execution environment. Since we provide a precompile driver, installation is really easy – it is a single command line and takes 30 seconds.

Once installed, sysdig will be able to surface very rich information about your CoreOS environment: both the host OS and the containers you have running. This includes everything from top processes, to network connections, to top files and directories, to a list of executed commands for both the host OS and any of the running containers. And that’s just the tip of the iceberg. For some interesting use cases with sysdig running in CoreOS environments, you can refer to our two-part CoreOS blog series here and here.

Q: What is the memory and CPU overhead required by Sysdig?

Loris: Typically low, but it depends on what kind of activity is happening on the machine. Sysdig instruments the operating system’s kernel, and the overhead depends on how many events there are to be captured. On a machine with average load, the CPU occupation should be very low: a few percentage points. CPU occupation of sysdig can go higher on systems with a lot of I/O or network activity. The Sysdig Cloud agent, on the other hand, incorporates additional protective mechanisms, such as subsampling techniques, to ensure the CPU occupation always stays within an acceptable range of <5%.

Q: You presented the Dark Art of Container Monitoring at CoreOS Fest this year. Tell us more about what should be monitored.

Loris: In terms of what should be monitored, my answer is: everything! The really important question is: how should it be monitored? The same features that make containers so interesting and revolutionary (i.e. the fact that they are isolated, self-contained, simple and lightweight), make them a real challenge to monitor. In particular, the traditional approach of having an agent on any “entity” doesn’t work well with containers, because it’s too invasive and doesn’t scale.

This is the problem we’re trying to solve with sysdig and Sysdig Cloud. We’re excited about working on it because great visibility is a key requirement to adopt containers in production.

Q: Describe what Sysdig does with CoreOS Linux to help monitor system security.

Loris: Sysdig has powerful security-oriented features. Here are some examples of what CoreOS users can do with sysdig to monitor system security:

  • Show the directories that the user "root" visits
  • Observe ssh activity
  • Show every file open that happens in /etc
  • Show all the network connections established by a process

Now think about being able to obtain this information for any container running on a CoreOS host, but from outside the container, with no instrumentation and no dependencies.

If you are curious to try sysdig out, installation on CoreOS is super easy and instructions can be found here. And don’t forget to let us know what you think on twitter or at!

Join CoreOS and Sysdig in San Francisco for the July Meetup

Attend this month’s CoreOS San Francisco Meetup that will feature the CoreOS team and Gianluca Borello, senior software engineer at Sysdig.

When: Wednesday, July 29, 2015 starting at 6 p.m. PT

Where: Okta, 301 Brannan Street, San Francisco, CA 94107


July 12, 2015

Scapy and IP Options

Create packets with custom IPv4 IP Option fields using Scapy:

>>> packet=IP(src="",dst="",options=[IPOption('%s%s'%('\x86\x28','a'*38))])
>>> ls(packet)
version    : BitField             = 4               (4)
ihl        : BitField             = None            (None)
tos        : XByteField           = 0               (0)
len        : ShortField           = None            (None)
id         : ShortField           = 1               (1)
flags      : FlagsField           = 0               (0)
frag       : BitField             = 0               (0)
ttl        : ByteField            = 64              (64)
proto      : ByteEnumField        = 0               (0)
chksum     : XShortField          = None            (None)
src        : Emph                 = ''   (None)
dst        : Emph                 = ''   ('')
options    : PacketListField      = [<IPOption  copy_flag=1L optclass=control option=commercial_security length=40 value='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' |>] ([])
>>> sr1(packet)

The above code results in the following packet (as seen by Wireshark):

Wireshark showing the packet with the custom IP Option

July 11, 2015

Upgrade to Debian 8 without systemd

To avoid the automatic installation/switch to systemd during the upgrade to Debian 8, it is enough to prevent the installation of the systemd-sysv package.

This can be done by creating a file /etc/preferences.d/no-systemd-sysv with the following content:

Package: systemd-sysv
Pin: release o=Debian
Pin-Priority: -1


July 10, 2015

OpenSSL has been Updated (CVE-2015-1793)

The Alternative Chains Certificate Forgery vulnerability in OpenSSL, as reported in CVE-2015-1793, has been patched in CoreOS Linux (Alpha, Beta and Stable channels). If automatic updates are enabled (default configuration), your server should be patched within the next several hours (if it hasn’t already received the update).

If automatic updates are disabled, you can force an update by running update_engine_client -check_for_update.

If you have any questions or concerns, please join us in IRC freenode/#coreos.

How to get involved with CoreOS projects

Today we’re excited to build and collaborate with our community at the inaugural CoreOS hackathon. Even if you can’t join us at GopherCon in Denver, there are numerous ways to get involved and contribute.

Every project on GitHub includes helpful information on contributing that can be found in the file. Be sure to look at it before you jump in and begin coding.


Serving as the backbone of many distributed systems, from Kubernetes to Pivotal’s Cloud Foundry and beyond, etcd is a Go codebase and key-value store where the state of the art in distributed systems comes together. If your interests lie in consensus protocols, APIs and clustering, etcd has a number of areas where contribution is more than welcome.

Hack on etcd.


CoreOS Linux is built for scale, and at scale, managing systemd can be a challenge. We created fleet to distribute init across the data center. fleet is used in nearly all CoreOS deployments to simplify administrative overhead. If you are interested in operational plumbing, there is no shortage of work to be done.

Hack on fleet.


If software defined networks and the low levels of data center connectivity are in your interests, you can help build flannel, the container-friendly software networking fabric.

Hack on flannel.


Jump in with the rkt team and help create a secure, composable and standards based container runtime starting with these requests. Help shape the future of the container ecosystem and stay up to date with our rkt mailing list too.

With the recent announcement to collaborate on the Open Container Project (OCP), stay tuned for updates on how we will work together and work with OCP and rkt.

GopherCon Hack Day on July 10

For any of you attending GopherCon, here are the details of the hack day:

When: Friday, July 10, 2015 from 10:00 a.m. - 5:00 p.m. MDT

Where: Room 403, Denver Convention Center


10:00 a.m. - 10:30 a.m. - Brandon Philips, CoreOS Two Years in

10:30 a.m. - 11:00 a.m. - Kelsey Hightower, Kubernetes talk

11 a.m. - 11:30 a.m. - Russell Haering, ScaleFT

11:30 a.m. - 12 p.m. - Micha Leuffen, Wercker


1:00 p.m. - 4:00 p.m. - Hack Day Competition

4:00 p.m. - 5:00 p.m. - Competition demos and winner announcement

We welcome your involvement and contributions to CoreOS projects. We wouldn’t be here without our contributors and there is much to be done!

July 08, 2015

OpenPower Firmware Stack

The OpenPower server platform comprises one or more Power8 processors, the latest of the IBM PowerPC family, and some kind of management controller to power on and monitor the state of the main processor(s). This post provides an overview of the different bits of open source firmware that are used to take the machine from power on all the way through to running your operating system.

Tyan Palmetto Motherboard

The Tyan GN70-BP010 is the first OpenPower machine to ship. Known also by its codename Palmetto, it contains a single Power8 processor and an Aspeed AST2400 ARM System on Chip which we refer to as the Baseboard Management Controller (BMC). The BMC is a typical embedded ARM system: u-boot, Linux kernel and stripped down userspace. This firmware is built by OpenPower Foundation member AMI.

P8 Boot

The BMC and the Power8 share a common memory mapped interface, called the LPC bus. This is the interface over which the Power8 accesses boot firmware, as well as boot time configuration, from a SPI attached PNOR flash, and speaks to the BMC’s IPMI stack over the BT interface.

Hostboot Starting

When it comes to starting the Power8 the BMC wiggles a pin to kick the SBE (Self Boot Engine) into gear. This tiny processor in the Power8 loads the first stage firmware, called Hostboot, from PNOR and configures one of the Power8 threads to execute it from L3 cache. Hostboot is responsible for bringing up the internal buses in the Power8, as well as the rest of the cores, SDRAM memory, and another on-CPU microcontroller called the OCC (On Chip Controller).

P8 Boot Flow

When Hostboot is finished the procedures it loads a payload from the PNOR. This payload is the second stage firmware, known as Skiboot. Skiboot synchronises the timebase between all the CPU threads, brings up the PCIe buses, communicates with the management controller, and provides the runtime OPAL (Open Power Abstraction Layer) interface for the operating system to call into. Skiboot is also responsible for loading the next stage bootloader, which in this case is a Linux kernel and root file system that provide the Petitboot loader environment.

Skiboot Starting

Petitboot Starting

Petitboot is a bootloader that discovers all the disks and network devices in the system, and presents a menu for the user to select which OS to run. Petiboot looks for PXE configuration information, as well as parsing Grub configuration files found on local disks. Petitboot reads configration information from the NVRAM partition on the PNOR, which means it can be configured to boot from a specific network interface, hard drive, or even not boot at all and wait for user input. Once the boot OS has been selected, Petitboot uses the Linux kexec functionality to jump into the host kernel.

PetitbootPetitboot Menu

July 07, 2015

Happy 2nd Epoch CoreOS Linux

Our CoreOS Linux version numbers are counted from our epoch on July 1, 2013, which means this month marks the end of our second year working on CoreOS Linux.

Today we rolled out the 735.0.0 release of CoreOS Linux to the alpha channel. Our CoreOS Linux version numbers are counted from our epoch on July 1, 2013, which means this month marks the end of our second year working on CoreOS Linux.

Two years ago we started this journey with a vision of improving the consistency, deployment speed and security of server infrastructure. In this time we have kicked off a rethinking of how server OSes are designed and used. In a recent article InfoWorld said:

CoreOS Linux “was the first influential micro operating system designed for today’s cloud environments.”

Last year, we celebrated our first stable channel release and since then we have been hard at work pushing important bug fixes and feature releases to that channel every 2.5 weeks on average.

CoreOS Year 1 Highlights

In the post for that first stable release we highlighted our progress to date:

  • CoreOS engineers contributed features and fixes to open source projects including Docker, the Linux kernel, networkd, systemd and more
  • Official CoreOS image added to Google Compute Engine, Rackspace, Amazon
  • Joined the Docker Governance Board as a Contributing Member
  • Today’s most respected technology companies and many Fortune 500 companies are using and testing CoreOS in their environments

CoreOS Year 2 Highlights

In the tradition of that post one year ago, let’s take a look at some of the highlights from the last year of CoreOS.

  • Announced Tectonic, a commercial Kubernetes platform that combines the CoreOS stack with Kubernetes to bring companies Google-style infrastructure
  • Worked with community partners to create App Container (appc), a specification defining a container image format, runtime environment and discovery protocol, to work towards the goal of a standard, portable shipping container for applications
  • Created rkt, a container runtime designed for composability, security and speed and the first implementation of appc
  • joined CoreOS to provide Enterprise Registry, delivering secured hosting of your container repositories behind the firewall
  • Released etcd 2.0, which powers important projects in the container and distributed systems ecosystem including the flannel, locksmith, fleet and Kubernetes projects. etcd also supports community projects like HashiCorp’s Vault, and Docker 1.7s networking backend.
  • Joined forces with industry leaders to launch the Open Container Project, chartered to establish common standards for software containers

Our ability to build and ship innovative and high-quality projects is due in large part to the feedback and interest from our community. Thank you for all of your help in contributing, bug testing, promoting and learning more about what we are doing here at CoreOS.

Celebrate With Us at GopherCon

We will be celebrating our second birthday with our friends at GopherCon in Denver. Swing by our booth to get a limited edition CoreOS GopherCon sticker. Or, join us at our birthday party, brought to you by our friends from Couchbase and on Thursday, July 9, at 8 p.m. MDT. Lastly, don’t miss our hack day on Friday, July 10, where you can work alongside a CoreOS engineer, learn about our open source projects and compete for prizes.

RSVP for our Second Birthday Party

Thursday, July 9 at 8 - 11 p.m. MDT

Pizza Republica in Denver, Colorado. Sponsored by Couchbase and

CoreOS Birthday Hack Day

Friday, July 10 at 10 a.m. - 5 p.m. MDT

Room 403, GopherCon in Denver, Colorado

July 06, 2015

Upcoming CoreOS Events in July

Need your CoreOS fix? Check out where you can find us this month!

Tuesday, July 7-Friday, July 10, 2015 - Denver, CO

We’re going to GopherCon! Be sure to stop by our booth to pick up some swag and say hello.

Tuesday, July 7, 2015 at 6:00 p.m. MDT - Denver, CO

Start your GopherCon experience the right way. Join Brian “Redbeard” Harrington and other awesome speakers at the GopherCon Kick off party!

Thursday, July 9, 2015 at 1:40 p.m. MDT - Denver, CO

Be sure to check out Barak Michener give a talk at GopherCon about Cayley and building a graph database.

Thursday, July 9, 2015 at 4:00 p.m. MDT - Denver, CO

Don’t miss Kelsey Hightower at GopherCon talking about betting the company on go and winning!

Thursday, July 9, 2015 at 8:00 p.m. MDT - Denver, CO

If you’re attending GopherCon, come celebrate our second birthday with us and our friends from Couchbase and at Pizza Republica! Pizza, beer and video games included. RSVP here!

Friday, July 10, 2015 - 11:00 a.m. BRT - Porto Alegre, Brazil

Meet Matthew Garrett and discuss Free Software communities at FISL in Brazil. He’ll present, Using DRM technologies to protect users.

Friday, July 10, 2015 at 10:00 a.m. MDT - Denver, CO

End GopherCon with a good time! Join us at our Hack Day in room 403. We’ll have speakers from CoreOS and the community, as well as a special Hack Day competition.

Tuesday, July 14, 2015 at 6:00 p.m. IDT - Tel Aviv, Israel

If you’re in Tel Aviv, swing by the Docker Tel Aviv Meetup to hear Joey Schorr discuss the container lifecylce.

Tuesday, July 14, 2015 at 5:00 p.m. EDT - Online

DataStax is hosting a webinar on leveraging Docker and CoreOS to provide always available Cassandra at Instaclustr. Register here!

Tuesday, July 21, 2015 at 10:00 a.m. PDT - Portland, OR

Join us as we celebrate Kubernetes 1.0! Come by in person at the event or after party if you’re at OSCON. If you can’t make it to Portland, not to worry. Register to watch the keynote here.

Tuesday, July 21, 2015 at 1:30 p.m. PDT - Portland, OR

Kelsey Hightower will be at OSCON giving a workshop on taming microservices with CoreOS and Kubernetes. Don’t miss it!

Thursday, July 23, 2015 at 7:00 p.m. BST - London, UK

We’re ending the month with our friends at the CoreOS London Meetup! Come hang out and learn more about Tectonic and how it combines Kubernetes and the CoreOS software portfolio.

Want more CoreOS in your city? Let us know! email us at

July 05, 2015

July 01, 2015

Introducing flannel 0.5.0 with AWS and GCE

Last week we released flannel v0.5, a virtual network that gives a range of IP addresses to each host to use with container runtimes. We have been working hard to add features to flannel to enable a wider variety of use cases, such as taking advantage of cloud providers' networking capabilities, as part of the goal to enable containers to effectively communicate across networks and ensure they are easily portable across cloud providers.

With this in mind, flannel v0.5 includes the following new features:

  • support for Google Compute Engine (GCE),
  • a client/server mode and,
  • a multi-network mode.

Please refer to the readme for details on the client/server and the multi-network modes.

Try Out the New Release

In this post we will provide an overview of how to setup flannel on Amazon Virtual Private Cloud (Amazon VPC) backend introduced in flannel v0.4 and the newly added GCE backend.

When flannel runs the gce or the aws-vpc backend it does not create a separate interface as it does when running the udp or the vxlan backends.

This is because with gce and aws-vpc backends, there is no overlay or encapsulation and flannel simply manipulates the IP routes to achieve maximum performance.

Let’s get started with setting up flannel on GCE instances.

GCE Backend

From the Developers Console, we start by creating a new network.

Configure the network name and address range. Then add firewall rules to allow etcd traffic (tcp/2379), SSH, and ICMP. That's it for the network configuration. Now it’s time to create an instance. Let's call it demo-instance-1. Under the "Management, disk, networking, access & security options" make the following changes:

  • Select the "Network" to be our newly created network
  • Enable IP forwarding
  • Under "Access and Security" set the compute permissions to "Read Write" and remember to add your SSH key

New GCE Instance
Booting a new GCE instance
Security settings for a new instance
Security settings for a new instance

With the permissions set, we can launch the instance!

The only remaining steps now are to start etcd, publish the network configuration and lastly, run the flannel daemon. SSH into demo-instance-1 and execute the following steps:

  • Start etcd:
$ etcd2 -advertise-client-urls http://$INTERNAL_IP:2379 -listen-client-urls
  • Publish configuration in etcd (ensure that the network range does not overlap with the one configured for the GCE network)
$ etcdctl set / '{"Network":"", "Backend": {"Type": "gce"}}'
  • Fetch the 0.5 release using wget from here
  • Run flannel daemon:
$ sudo ./flanneld --etcd-endpoints=

Now make a clone of demo-instance-1 and SSH into it to run the these steps:

  • Fetch the 0.5 release as before.
  • Run flannel with the --etcd-endpoints flag set to the internal IP of the instance running etcd

Check that the subnet lease acquired by each of the hosts has been added!

GCE Routes

It’s important to note that GCE currently limits the number of routes per project to 100.

Amazon VPC Backend

In order to run flannel on AWS we need to first create an Amazon VPC. Amazon VPC enables us to launch EC2 instances into a virtual network, which we can configure via its route table.

From the VPC dashboard start out by running the "VPC Wizard":

  • Select "VPC with a Single Public Subnet"
  • Configure the network and the subnet address ranges

Creating a new Amazon VPC

Now that we have set up our VPC and subnet, let’s create an Identity and Access Management (IAM) role to grant the required permissions to our EC2 instances.

From the console, select Services -> Administration & Security -> IAM.

We first need to create a policy that we will later assign to an IAM role. Under "Create Policy" select the "Create Your Own Policy" option. The following permissions are required as shown below in the sample policy document.

  • ec2:CreateRoute
  • ec2:DeleteRoute
  • ec2:ReplaceRoute
  • ec2:DescribeRouteTables
  • ec2:DescribeInstances
    "Version": "2012-10-17",
    "Statement": [
            "Effect": "Allow",
            "Action": [
            "Resource": [
            "Effect": "Allow",
            "Action": [
            "Resource": "*"

Note that although the first three permissions can be tied to the route table resource of our subnet, the ec2:Describe* permissions can not be limited to a particular resource. For simplicity, we leave the "Resource" as wildcard in both.

With the policy added, let's attach it to a new IAM role by clicking the "Create New Role" button and setting the following options:

  • Role Name: demo-role
  • Role Type: "Amazon EC2"
  • Attach the policy we created earlier

We are now all set to launch an EC2 instance. In the launch wizard, choose the CoreOS-stable-681.2.0 image and under "Configure Instance Details" perform the following steps:

  • Change the "Network" to the VPC we just created
  • Enable "Auto-assign Public IP"
  • Select IAM demo-role

Configuring AWS EC2 instance details

Under the "Configure Security Group" tab add the rules to allow etcd traffic (tcp/2379), SSH and ICMP.

Go ahead and launch the instance!

Since our instance will be sending and receiving traffic for IPs other than the one assigned by our subnet, we need to disable source/destination checks.

Disable AWS Source/Dest Check

All that’s left now is to start etcd, publish the network configuration and run the flannel daemon. First, SSH into demo-instance-1:

  • Start etcd:
$ etcd2 -advertise-client-urls http://$INTERNAL_IP:2379 -listen-client-urls
  • Publish configuration in etcd (ensure that the network range does not overlap with the one configured for the VPC)
$ etcdctl set / '{"Network":"", "Backend": {"Type": "aws-vpc"}}'
  • Fetch the latest release using wget from here
  • Run flannel daemon:
sudo ./flanneld --etcd-endpoints=

Next, create and connect to a clone of demo-instance-1. Run flannel with the --etcd-endpoints flag set to the internal IP of the instance running etcd.

Confirm that the subnet route table has entries for the lease acquired by each of the subnets.

AWS Routes

Keep in mind that the Amazon VPC limits the number of entries per route table to 50.

Note that these are just sample configurations, so feel free to try it out and set up what works best for you!

June 29, 2015

In Practice, What is the C Language, Really?

The official definition of the C Language is the standard, but the standard doesn't actually compile any programs. One can argue that the actual implementations are the real definition of the C Language, although further thought along this line usually results in a much greater appreciation of the benefits of having standards. Nevertheless, the implementations usually win any conflicts with the standard, at least in the short term.

Another interesting source of definitions is the opinions of the developers who actually write C. And both the standards bodies and the various implementations do take these opinions into account at least some of the time. Differences of opinion within the standards bodies are sometimes settled by surveying existing usage, and implementations sometimes provide facilities outside the standard based on user requests. For example, relatively few compiler warnings are actually mandated by the standard.

Although one can argue that the standard is the end-all and be-all definition of the C Language, the fact remains that if none of the implementers provide a facility called out by the standard, the implementers win. Similarly, if nobody uses a facility that is called out by the standard, the users win—even if that facility is provided by each and every implementation. Of course, things get more interesting if the users want something not guaranteed by the standard.

Therefore, it is worth knowing what users expect, even if only to adjust their expectations, as John Regehr has done for number of topics, perhaps most notably signed integer overflow. Some researchers have been taking a more proactive stance, with one example being Peter Sewell's group from the University of Cambridge. This group has put together a survey on padding bytes, pointer arithmetic, and unions. This survey is quite realistic, with “that would be crazy” being a valid answer to a number of the questions.

So, if you think you know a thing or two about C's handling of padding bytes, pointer arithmetic, and unions, take the survey!

June 28, 2015


One of my clients has a NAS device. Last week they tried to do what should have been a routine RAID operation, they added a new larger disk as a hot-spare and told the RAID array to replace one of the active disks with the hot-spare. The aim was to replace the disks one at a time to grow the array. But one of the other disks had an error during the rebuild and things fell apart.

I was called in after the NAS had been rebooted when it was refusing to recognise the RAID. The first thing that occurred to me is that maybe RAID-5 isn’t a good choice for the RAID. While it’s theoretically possible for a RAID rebuild to not fail in such a situation (the data that couldn’t be read from the disk with an error could have been regenerated from the disk that was being replaced) it seems that the RAID implementation in question couldn’t do it. As the NAS is running Linux I presume that at least older versions of Linux have the same problem. Of course if you have a RAID array that has 7 disks running RAID-6 with a hot-spare then you only get the capacity of 4 disks. But RAID-6 with no hot-spare should be at least as reliable as RAID-5 with a hot-spare.

Whenever you recover from disk problems the first thing you want to do is to make a read-only copy of the data. Then you can’t make things worse. This is a problem when you are dealing with 7 disks, fortunately they were only 3TB disks and only each had 2TB in use. So I found some space on a ZFS pool and bought a few 6TB disks which I formatted as BTRFS filesystems. For this task I only wanted filesystems that support snapshots so I could work on snapshots not on the original copy.

I expect that at some future time I will be called in when an array of 6+ disks of the largest available size fails. This will be a more difficult problem to solve as I don’t own any system that can handle so many disks.

I copied a few of the disks to a ZFS filesystem on a Dell PowerEdge T110 running kernel 3.2.68. Unfortunately that system seems to have a problem with USB, when copying from 4 disks at once each disk was reading about 10MB/s and when copying from 3 disks each disk was reading about 13MB/s. It seems that the system has an aggregate USB bandwidth of 40MB/s – slightly greater than USB 2.0 speed. This made the process take longer than expected.

One of the disks had a read error, this was presumably the cause of the original RAID failure. dd has the option conv=noerror to make it continue after a read error. This initially seemed good but the resulting file was smaller than the source partition. It seems that conv=noerror doesn’t seek the output file to maintain input and output alignment. If I had a hard drive filled with plain ASCII that MIGHT even be useful, but for a filesystem image it’s worse than useless. The only option was to repeatedly run dd with matching skip and seek options incrementing by 1K until it had passed the section with errors.

for n in /dev/loop[0-6] ; do echo $n ; mdadm –examine -v -v –scan $n|grep Events ; done

Once I had all the images I had to assemble them. The Linux Software RAID didn’t like the array because not all the devices had the same event count. The way Linux Software RAID (and probably most RAID implementations) work is that each member of the array has an event counter that is incremented when disks are added, removed, and when data is written. If there is an error then after a reboot only disks with matching event counts will be used. The above command shows the Events count for all the disks.

Fortunately different event numbers aren’t going to stop us. After assembling the array (which failed to run) I ran “mdadm -R /dev/md1” which kicked some members out. I then added them back manually and forced the array to run. Unfortunately attempts to write to the array failed (presumably due to mismatched event counts).

Now my next problem is that I can make a 10TB degraded RAID-5 array which is read-only but I can’t mount the XFS filesystem because XFS wants to replay the journal. So my next step is to buy another 2*6TB disks to make a RAID-0 array to contain an image of that XFS filesystem.

Finally backups are a really good thing…

June 27, 2015 adventures

Over the past few months I started to notice occasional issues when cloning repositories (particularly nova) from

It would fail with something like

git clone -vvv git:// .
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

The problem would occur sporadically during our 3rd party CI runs causing them to fail. Initially these went somewhat ignored as rechecks on the jobs would succeed and the world would be shiny again. However, as they became more prominent the issue needed to be addressed.

When a patch merges in gerrit it is replicated out to 5 different cgit backends (git0[1-5] These are then balanced by two HAProxy frontends which are on a simple DNS round-robin.

                          | |
                          |    (DNS Lookup)   |
                             |             |
                    +--------+             +--------+
                    |           A records           |
+-------------------v----+                    +-----v------------------+
| |                    | |
|   (HAProxy frontend)   |                    |   (HAProxy frontend)   |
+-----------+------------+                    +------------+-----------+
            |                                              |
            +-----+                                    +---+
                  |                                    |
            |    +---------------------+  (source algorithm) |
            |    | |                     |
            |    |   +---------------------+                 |
            |    +---| |                 |
            |        |   +---------------------+             |
            |        +---| |             |
            |            |   +---------------------+         |
            |            +---| |         |
            |                |   +---------------------+     |
            |                +---| |     |
            |                    |  (HAProxy backend)  |     |
            |                    +---------------------+     |

Reproducing the problem was difficult. At first I was unable to reproduce locally, or even on an isolated turbo-hipster run. Since the problem appeared to be specific to our 3rd party tests (little evidence of it in 1st party runs) I started by adding extra debugging output to git.

We were originally cloning repositories via the git:// protocol. The debugging information was unfortunately limited and provided no useful diagnosis. Switching to https allowed for more CURL output (when using GIT_CURL_VERBVOSE=1 and GIT_TRACE=1) but this in itself just created noise. It actually took me a few days to remember that the servers are running arbitrary code anyway (a side effect of testing) and therefore cloning from the potentially insecure http protocol didn’t provide any further risk.

Over http we got a little more information, but still nothing that was conclusive at this point:

git clone -vvv .

error: RPC failed; result=18, HTTP code = 200
fatal: The remote end hung up unexpectedly
fatal: protocol error: bad pack header

After a bit it became more apparent that the problems would occur mostly during high (patch) traffic times. That is, when a lot of tests need to be queued. This lead me to think that either the network turbo-hipster was on was flaky when doing multiple git clones in parallel or the git servers were flaky. The lack of similar upstream failures lead me to initially think it was the former. In order to reproduce I decided to use Ansible to do multiple clones of repositories and see if that would uncover the problem. If needed I would have then extended this to orchestrating other parts of turbo-hipster in case the problem was systemic of something else.

Firstly I need to clone from a bunch of different servers at once to simulate the network failures more closely (rather than doing multiple clones on the one machine or from the one IP in containers for example). To simplify this I decided to learn some Ansible to launch a bunch of nodes on Rackspace (instead of doing it by hand).

Using the pyrax module I put together a crude playbook to launch a bunch of servers. There is likely much neater and better ways of doing this, but it suited my needs. The playbook takes care of placing appropriate sshkeys so I could continue to use them later.

    - name: Create VMs
      hosts: localhost
        ssh_known_hosts_command: "ssh-keyscan -H -T 10"
        ssh_known_hosts_file: "/root/.ssh/known_hosts"
        - name: Provision a set of instances
            module: rax
            name: "josh-testing-ansible"
            flavor: "4"
            image: "Ubuntu 12.04 LTS (Precise Pangolin) (PVHVM)"
            region: "DFW"
            count: "15"
            group: "raxhosts"
            wait: yes
          register: raxcreate

        - name: Add the instances we created (by public IP) to the group 'raxhosts'
            module: add_host
            hostname: "{{ }}"
            ansible_ssh_host: "{{ item.rax_accessipv4 }}"
            ansible_ssh_pass: "{{ item.rax_adminpass }}"
            groupname: raxhosts
          with_items: raxcreate.success
          when: raxcreate.action == 'create'

        - name: Sleep to give time for the instances to start ssh
          #there is almost certainly a better way of doing this
          pause: seconds=30

        - name: Scan the host key
          shell: "{{ ssh_known_hosts_command}} {{ item.rax_accessipv4 }} &gt;&gt; {{ ssh_known_hosts_file }}"
          with_items: raxcreate.success
          when: raxcreate.action == 'create'

    - name: Set up sshkeys
      hosts: raxhosts
       - name: Push root's pubkey
         authorized_key: user=root key="{{ lookup('file', '/root/.ssh/') }}"

From here I can use Ansible to work on those servers using the rax inventory. This allows me to address any nodes within my tenant and then log into them with the seeded sshkey.

The next step of course was to run tests. Firstly I just wanted to reproduce the issue, so in order to do that it would crudely set up an environment where it can simply clone nova multiple times.

    - name: Prepare servers for git testing
      hosts: josh-testing-ansible*
      serial: "100%"
        - name: Install git
          apt: name=git state=present update_cache=yes
        - name: remove nova if it is already cloned
          shell: 'rm -rf nova'

    - name: Clone nova and monitor tcpdump
      hosts: josh-testing-ansible*
      serial: "100%"
        - name: Clone nova
          shell: "git clone"

By default Ansible runs with 5 folked processes. Meaning that Ansible would work on 5 servers at a time. We want to exercise git heavily (in the same way turbo-hipster does) so we use the –forks param to run the clone on all the servers at once. The plan was to keep launching servers until the error reared its head from the load.

To my surprise this happened with very few nodes (less than 15, but I left that as my minimum testing). To confirm I also ran the tests after launching further nodes to see it fail at 50 and 100 concurrent clones. It turned out that the more I cloned the higher the failure rate percentage was.

Now that I had the problem reproducing, it was time to do some debugging. I modified the playbook to capture tcpdump information during the clone. Initially git was cloning over IPv6 so I turned that off on the nodes to force IPv4 (just in case it was a v6 issue, but the problem did present itself on both networks). I also locked to one IP rather than randomly hitting both front ends.

    - name: Prepare servers for git testing
      hosts: josh-testing-ansible*
      serial: "100%"
        - name: Install git
          apt: name=git state=present update_cache=yes
        - name: remove nova if it is already cloned
          shell: 'rm -rf nova'

    - name: Clone nova and monitor tcpdump
      hosts: josh-testing-ansible*
      serial: "100%"
        cap_file: tcpdump_{{ ansible_hostname }}_{{ ansible_date_time['epoch'] }}.cap
        - name: Disable ipv6 1/3
          sysctl: name="net.ipv6.conf.all.disable_ipv6" value=1 sysctl_set=yes
        - name: Disable ipv6 2/3
          sysctl: name="net.ipv6.conf.default.disable_ipv6" value=1 sysctl_set=yes
        - name: Disable ipv6 3/3
          sysctl: name="net.ipv6.conf.lo.disable_ipv6" value=1 sysctl_set=yes
        - name: Restart networking
          service: name=networking state=restarted
        - name: Lock git.o.o to one host
          lineinfile: dest=/etc/hosts line='' state=present
        - name: start tcpdump
          command: "/usr/sbin/tcpdump -i eth0 -nnvvS -w /tmp/{{ cap_file }}"
          async: 6000000
          poll: 0 
        - name: Clone nova
          shell: "git clone"
          #shell: "git clone"
          ignore_errors: yes
        - name: kill tcpdump
          command: "/usr/bin/pkill tcpdump"
        - name: compress capture file
          command: "gzip {{ cap_file }} chdir=/tmp"
        - name: grab captured file
          fetch: src=/tmp/{{ cap_file }}.gz dest=/var/www/ flat=yes

This gave us a bunch of compressed capture files that I was then able to seek the help of my colleagues to debug (a particular thanks to Angus Lees). The results from an early run can be seen here:

Gus determined that the problem was due to a RST packet coming from the source at roughly 60 seconds. This indicated it was likely we were hitting a timeout at the server or a firewall during the git-upload-pack of the clone.

The solution turned out to be rather straight forward. The git-upload-pack had simply grown too large and would timeout depending on the load on the servers. There was a timeout in apache as well as the HAProxy config for both frontend and backend responsiveness. The relative patches can be found at and

While upping the timeout avoids the problem, certain projects are clearly pushing the infrastructure to its limits. As such a few changes were made by the infrastructure team (in particular James Blair) to improve’s responsiveness.

Firstly is now a higher performance (30GB) instance. This is a large step up from the previous (8GB) instances that were used as the frontend previously. Moving to one frontend additionally meant the HAProxy algorithm could be changed to leastconn to help balance connections better (

                          |  |
                          | (HAProxy frontend) |
            |  +---------------------+  (leastconn algorithm) |
            |  | |                        |
            |  |   +---------------------+                    |
            |  +---| |                    |
            |      |   +---------------------+                |
            |      +---| |                |
            |          |   +---------------------+            |
            |          +---| |            |
            |              |   +---------------------+        |
            |              +---| |        |
            |                  |  (HAProxy backend)  |        |
            |                  +---------------------+        |

All that was left was to see if things had improved. I rerun the test across 15, 30 and then 45 servers. These were all able to clone nova reliably where they had previously been failing. I then upped it to 100 servers where the cloning began to fail again.

Post-fix logs for those interested:

At this point, however, I’m basically performing a Distributed Denial of Service attack against git. As such, while the servers aren’t immune to a DDoS the problem appears to be fixed.

June 24, 2015

Smart Phones Should Measure Charge Speed

My first mobile phone lasted for days between charges. I never really found out how long it’s battery would last because there was no way that I could use it to deplete the charge in any time that I could spend awake. Even if I had managed to run the battery out the phone was designed to accept 4*AA batteries (it’s rechargeable battery pack was exactly that size) so I could buy spare batteries at any store.

Modern phones are quite different in physical phone design (phones that weigh less than 4*AA batteries aren’t uncommon), functionality (fast CPUs and big screens suck power), and use (games really drain your phone battery). This requires much more effective chargers, when some phones are intensively used (EG playing an action game with Wifi enabled) they can’t be charged as they use more power than the plug-pack supplies. I’ve previously blogged some calculations about resistance and thickness of wires for phone chargers [1], it’s obvious that there are some technical limitations to phone charging based on the decision to use a long cable at ~5V.

My calculations about phone charge rate were based on the theoretical resistance of wires based on their estimated cross-sectional area. One problem with such analysis is that it’s difficult to determine how thick the insulation is without destroying the wire. Another problem is that after repeated use of a charging cable some conductors break due to excessive bending. This can significantly increase the resistance and therefore increase the charging time. Recently a charging cable that used to be really good suddenly became almost useless. My Galaxy Note 2 would claim that it was being charged even though the reported level of charge in the battery was not increasing, it seems that the cable only supplied enough power to keep the phone running not enough to actually charge the battery.

I recently bought a USB current measurement device which is really useful. I have used it to diagnose power supplies and USB cables that didn’t work correctly. But one significant way in which it fails is in the case of problems with the USB connector. Sometimes a cable performs differently when connected via the USB current measurement device.

The CurrentWidget program [2] on my Galaxy Note 2 told me that all of the dedicated USB chargers (the 12V one in my car and all the mains powered ones) supply 1698mA (including the ones rated at 1A) while a PC USB port supplies ~400mA. I don’t think that the Note 2 measurement is particularly reliable. On my Galaxy Note 3 it always says 0mA, I guess that feature isn’t implemented. An old Galaxy S3 reports 999mA of charging even when the USB current measurement device says ~500mA. It seems to me that method the CurrentWidget uses to get the current isn’t accurate if it even works at all.

Android 5 on the Nexus 4/5 phones will tell the amount of time until the phone is charged in some situations (on the Nexus 4 and Nexus 5 that I used for testing it didn’t always display it and I don’t know why). This is an useful but it’s still not good enough.

I think that what we need is to have the phone measure the current that’s being supplied and report it to the user. Then when a phone charges slowly because apps are using some power that won’t be mistaken for a phone charging slowly due to a defective cable or connector.

June 23, 2015

One Android Phone Per Child

I was asked for advice on whether children should have access to smart phones, it’s an issue that many people are discussing and seems worthy of a blog post.

Claimed Problems with Smart Phones

The first thing that I think people should read is this XKCD post with quotes about the demise of letter writing from 99+ years ago [1]. Given the lack of evidence cited by people who oppose phone use I think we should consider to what extent the current concerns about smart phone use are just reactions to changes in society. I’ve done some web searching for reasons that people give for opposing smart phone use by kids and addressed the issues below.

Some people claim that children shouldn’t get a phone when they are so young that it will just be a toy. That’s interesting given the dramatic increase in the amount of money spent on toys for children in recent times. It’s particularly interesting when parents buy game consoles for their children but refuse mobile phone “toys” (I know someone who did this). I think this is more of a social issue regarding what is a suitable toy than any real objection to phones used as toys. Obviously the educational potential of a mobile phone is much greater than that of a game console.

It’s often claimed that kids should spend their time reading books instead of using phones. When visiting libraries I’ve observed kids using phones to store lists of books that they want to read, this seems to discredit that theory. Also some libraries have Android and iOS apps for searching their catalogs. There are a variety of apps for reading eBooks, some of which have access to many free books but I don’t expect many people to read novels on a phone.

Cyber-bullying is the subject of a lot of anxiety in the media. At least with cyber-bullying there’s an electronic trail, anyone who suspects that their child is being cyber-bullied can check that while old-fashioned bullying is more difficult to track down. Also while cyber-bullying can happen faster on smart phones the victim can also be harassed on a PC. I don’t think that waiting to use a PC and learn what nasty thing people are saying about you is going to be much better than getting an instant notification on a smart phone. It seems to me that the main disadvantage of smart phones in regard to cyber-bullying is that it’s easier for a child to participate in bullying if they have such a device. As most parents don’t seem concerned that their child might be a bully (unfortunately many parents think it’s a good thing) this doesn’t seem like a logical objection.

Fear of missing out (FOMO) is claimed to be a problem, apparently if a child has a phone then they will want to take it to bed with them and that would be a bad thing. But parents could have a policy about when phones may be used and insist that a phone not be taken into the bedroom. If it’s impossible for a child to own a phone without taking it to bed then the parents are probably dealing with other problems. I’m not convinced that a phone in bed is necessarily a bad thing anyway, a phone can be used as an alarm clock and instant-message notifications can be turned off at night. When I was young I used to wait until my parents were asleep before getting out of bed to use my PC, so if smart-phones were available when I was young it wouldn’t have changed my night-time computer use.

Some people complain that kids might use phones to play games too much or talk to their friends too much. What do people expect kids to do? In recent times the fear of abduction has led to children doing playing outside a lot less, it used to be that 6yos would play with other kids in their street and 9yos would be allowed to walk to the local park. Now people aren’t allowing 14yo kids walk to the nearest park alone. Playing games and socialising with other kids has to be done over the Internet because kids aren’t often allowed out of the house. Play and socialising are important learning experiences that have to happen online if they can’t happen offline.

Apps can be expensive. But it’s optional to sign up for a credit card with the Google Play store and the range of free apps is really good. Also the default configuration of the app store is to require a password entry before every purchase. Finally it is possible to give kids pre-paid credit cards and let them pay for their own stuff, such pre-paid cards are sold at Australian post offices and I’m sure that most first-world countries have similar facilities.

Electronic communication is claimed to be somehow different and lesser than old-fashioned communication. I presume that people made the same claims about the telephone when it first became popular. The only real difference between email and posted letters is that email tends to be shorter because the reply time is smaller, you can reply to any questions in the same day not wait a week for a response so it makes sense to expect questions rather than covering all possibilities in the first email. If it’s a good thing to have longer forms of communication then a smart phone with a big screen would be a better option than a “feature phone”, and if face to face communication is preferred then a smart phone with video-call access would be the way to go (better even than old fashioned telephony).

Real Problems with Smart Phones

The majority opinion among everyone who matters (parents, teachers, and police) seems to be that crime at school isn’t important. Many crimes that would result in jail sentences if committed by adults receive either no punishment or something trivial (such as lunchtime detention) if committed by school kids. Introducing items that are both intrinsically valuable and which have personal value due to the data storage into a typical school environment is probably going to increase the amount of crime. The best options to deal with this problem are to prevent kids from taking phones to school or to home-school kids. Fixing the crime problem at typical schools isn’t a viable option.

Bills can potentially be unexpectedly large due to kids’ inability to restrain their usage and telcos deliberately making their plans tricky to profit from excess usage fees. The solution is to only use pre-paid plans, fortunately many companies offer good deals for pre-paid use. In Australia Aldi sells pre-paid credit in $15 increments that lasts a year [2]. So it’s possible to pay $15 per year for a child’s phone use, have them use Wifi for data access and pay from their own money if they make excessive calls. For older kids who need data access when they aren’t at home or near their parents there are other pre-paid phone companies that offer good deals, I’ve previously compared prices of telcos in Australia, some of those telcos should do [3].

It’s expensive to buy phones. The solution to this is to not buy new phones for kids, give them an old phone that was used by an older relative or buy an old phone on ebay. Also let kids petition wealthy relatives for a phone as a birthday present. If grandparents want to buy the latest smart-phone for a 7yo then there’s no reason to stop them IMHO (this isn’t a hypothetical situation).

Kids can be irresponsible and lose or break their phone. But the way kids learn to act responsibly is by practice. If they break a good phone and get a lesser phone as a replacement or have to keep using a broken phone then it’s a learning experience. A friend’s son head-butted his phone and cracked the screen – he used it for 6 months after that, I think he learned from that experience. I think that kids should learn to be responsible with a phone several years before they are allowed to get a “learner’s permit” to drive a car on public roads, which means that they should have their own phone when they are 12.

I’ve seen an article about a school finding that tablets didn’t work as well as laptops which was touted as news. Laptops or desktop PCs obviously work best for typing. Tablets are for situations where a laptop isn’t convenient and when the usage involves mostly reading/watching, I’ve seen school kids using tablets on excursions which seems like a good use of them. Phones are even less suited to writing than tablets. This isn’t a problem for phone use, you just need to use the right device for each task.

Phones vs Tablets

Some people think that a tablet is somehow different from a phone. I’ve just read an article by a parent who proudly described their policy of buying “feature phones” for their children and tablets for them to do homework etc. Really a phone is just a smaller tablet, once you have decided to buy a tablet the choice to buy a smart phone is just about whether you want a smaller version of what you have already got.

The iPad doesn’t appear to be able to make phone calls (but it supports many different VOIP and video-conferencing apps) so that could technically be described as a difference. AFAIK all Android tablets that support 3G networking also support making and receiving phone calls if you have a SIM installed. It is awkward to use a tablet to make phone calls but most usage of a modern phone is as an ultra portable computer not as a telephone.

The phone vs tablet issue doesn’t seem to be about the capabilities of the device. It’s about how portable the device should be and the image of the device. I think that if a tablet is good then a more portable computing device can only be better (at least when you need greater portability).

Recently I’ve been carrying a 10″ tablet around a lot for work, sometimes a tablet will do for emergency work when a phone is too small and a laptop is too heavy. Even though tablets are thin and light it’s still inconvenient to carry, the issue of size and weight is a greater problem for kids. 7″ tablets are a lot smaller and lighter, but that’s getting close to a 5″ phone.

Benefits of Smart Phones

Using a smart phone is good for teaching children dexterity. It can also be used for teaching art in situations where more traditional art forms such as finger painting aren’t possible (I have met a professional artist who has used a Samsung Galaxy Note phone for creating art work).

There is a huge range of educational apps for smart phones.

The Wikireader (that I reviewed 4 years ago) [4] has obvious educational benefits. But a phone with Internet access (either 3G or Wifi) gives Wikipedia access including all pictures and is a better fit for most pockets.

There are lots of educational web sites and random web sites that can be used for education (Googling the answer to random questions).

When it comes to preparing kids for “the real world” or “the work environment” people often claim that kids need to use Microsoft software because most companies do (regardless of the fact that most companies will be using radically different versions of MS software by the time current school kids graduate from university). In my typical work environment I’m expected to be able to find the answer to all sorts of random work-related questions at any time and I think that many careers have similar expectations. Being able to quickly look things up on a phone is a real work skill, and a skill that’s going to last a lot longer than knowing today’s version of MS-Office.

There are a variety of apps for tracking phones. There are non-creepy ways of using such apps for monitoring kids. Also with two-way monitoring kids will know when their parents are about to collect them from an event and can stay inside until their parents are in the area. This combined with the phone/SMS functionality that is available on feature-phones provides some benefits for child safety.

iOS vs Android

Rumour has it that iOS is better than Android for kids diagnosed with Low Functioning Autism. There are apparently apps that help non-verbal kids communicate with icons and for arranging schedules for kids who have difficulty with changes to plans. I don’t know anyone who has a LFA child so I haven’t had any reason to investigate such things. Anyone can visit an Apple store and a Samsung Experience store as they have phones and tablets you can use to test out the apps (at least the ones with free versions). As an aside the money the Australian government provides to assist Autistic children can be used to purchase a phone or tablet if a registered therapist signs a document declaring that it has a therapeutic benefit.

I think that Android devices are generally better for educational purposes than iOS devices because Android is a less restrictive platform. On an Android device you can install apps downloaded from a web site or from a 3rd party app download service. Even if you stick to the Google Play store there’s a wider range of apps to choose from because Google is apparently less restrictive.

Android devices usually allow installation of a replacement OS. The Nexus devices are always unlocked and have a wide range of alternate OS images and the other commonly used devices can usually have an alternate OS installed. This allows kids who have the interest and technical skill to extensively customise their device and learn all about it’s operation. iOS devices are designed to be sealed against the user. Admittedly there probably aren’t many kids with the skill and desire to replace the OS on their phone, but I think it’s good to have option.

Android phones have a range of sizes and features while Apple only makes a few devices at any time and there’s usually only a couple of different phones on sale. iPhones are also a lot smaller than most Android phones, according to my previous estimates of hand size the iPhone 5 would be a good tablet for a 3yo or good for side-grasp phone use for a 10yo [5]. The main benefits of a phone are for things other than making phone calls so generally the biggest phone that will fit in a pocket is the best choice. The tiny iPhones don’t seem very suitable.

Also buying one of each is a viable option.


I think that mobile phone ownership is good for almost all kids even from a very young age (there are many reports of kids learning to use phones and tablets before they learn to read). There are no real down-sides that I can find.

I think that Android devices are generally a better option than iOS devices. But in the case of special needs kids there may be advantages to iOS.

June 22, 2015

App Container and the Open Container Project

Today we’re pleased to announce that CoreOS, Docker, and a large group of industry leaders are working together on a standard container format through the formation of the Open Container Project (OCP). OCP is housed under the Linux Foundation, and is chartered to establish common standards for software containers. This announcement means we are starting to see the concepts behind the App Container spec and Docker converge. This is a win for both users of containers and our industry at large.

In December 2014 we announced rkt, a new container runtime intended to address issues around security and composability in the container ecosystem. At the same time, we started App Container (appc), a specification defining a container image format, runtime environment and discovery protocol, to work towards the goal of a standard, portable shipping container for applications. We believe strongly that open standards are key to the success of the container ecosystem.

We created App Container to kickstart a movement toward a shared industry standard. With the announcement of the Open Container Project, Docker is showing the world that they are similarly committed to open standards. Today Docker is the de facto image format for containers, and therefore is a good place to start from in working towards a standard. We look forward to working with Docker, Google, Red Hat and many others in this effort to bring together the best ideas across the industry.

As we participate in OCP, our primary goals are as follows:

  • Users should be able to package their application once and have it work with any container runtime (like Docker, rkt, Kurma, or Jetpack)
  • The standard should fulfill the requirements of the most rigorous security and production environments
  • The standard should be vendor neutral and developed in the open

App Container

We believe most of the core concepts from App Container will form an important part of OCP. Our experience developing App Container will play a critical role as we begin collaboration on the OCP specification. We anticipate that much of App Container will be directly integrated into the OCP specification, with tweaks being made to provide greater compatibility with the existing Docker ecosystem. The end goal is to converge on a single unified specification of a standard container format, and the success of OCP will mean the major goals of App Container are satisfied. Existing appc maintainers Brandon Philips and Vincent Batts will be two of the initial maintainers of OCP and will work to harmonize the needs of both communities in the spirit of a combined standard. At the same time we will work hard to ensure that users of appc will have a smooth migration to the new standard.

Continuing work on rkt

CoreOS remains committed to the rkt project and will continue to invest in its development. Today rkt is a leading implementation of appc, and we plan on it becoming a leading implementation of OCP. Open standards only work if there are multiple implementations of the specification, and we will develop rkt into a leading container runtime around the new shared container format. Our goals for rkt are unchanged: a focus on security and composability for the most demanding production environments.

We are excited the industry is converging a format that combines the best ideas from appc, rkt and Docker to achieve what we all need to succeed: a well-defined shared standard for containers.

For more information and to see the draft charter and founding formation of the OCP, go to

June 20, 2015

BTRFS Status June 2015

The version of btrfs-tools in Debian/Jessie is incapable of creating a filesystem that can be mounted by the kernel in Debian/Wheezy. If you want to use a BTRFS filesystem on Jessie and Wheezy (which isn’t uncommon with removable devices) the only options are to use the Wheezy version of mkfs.btrfs or to use a Jessie kernel on Wheezy. I recently got bitten by this issue when I created a BTRFS filesystem on a removable device with a lot of important data (which is why I wanted metadata duplication and checksums) and had to read it on a server running Wheezy. Fortunately KVM in Wheezy works really well so I created a virtual machine to read the disk. Setting up a new KVM isn’t that difficult, but it’s not something I want to do while a client is anxiously waiting for their data.

BTRFS has been working well for me apart from the Jessie/Wheezy compatability issue (which was an annoyance but didn’t stop me doing what I wanted). I haven’t written a BTRFS status report for a while because everything has been OK and there has been nothing exciting to report.

I regularly get errors from the cron jobs that run a balance supposedly running out of free space. I have the cron jobs due to past problems with BTRFS running out of metadata space. In spite of the jobs often failing the systems keep working so I’m not too worried at the moment. I think this is a bug, but there are many more important bugs.

Linux kernel version 3.19 was the first version to have working support for RAID-5 recovery. This means version 3.19 was the first version to have usable RAID-5 (I think there is no point even having RAID-5 without recovery). It wouldn’t be prudent to trust your important data to a new feature in a filesystem. So at this stage if I needed a very large scratch space then BTRFS RAID-5 might be a viable option but for anything else I wouldn’t use it. BTRFS still has had little performance optimisation, while this doesn’t matter much for SSD and for single-disk filesystems for a RAID-5 of hard drives that would probably hurt a lot. Maybe BTRFS RAID-5 would be good for a scratch array of SSDs. The reports of problems with RAID-5 don’t surprise me at all.

I have a BTRFS RAID-1 filesystem on 2*4TB disks which is giving poor performance on metadata, simple operations like “ls -l” on a directory with ~200 subdirectories takes many seconds to run. I suspect that part of the problem is due to the filesystem being written by cron jobs with files accumulating over more than a year. The “btrfs filesystem” command (see btrfs-filesystem(8)) allows defragmenting files and directory trees, but unfortunately it doesn’t support recursively defragmenting directories but not files. I really wish there was a way to get BTRFS to put all metadata on SSD and all data on hard drives. Sander suggested the following command to defragment directories on the BTRFS mailing list:

find / -xdev -type d -execdir btrfs filesystem defrag -c {} +

Below is the output of “zfs list -t snapshot” on a server I run, it’s often handy to know how much space is used by snapshots, but unfortunately BTRFS has no support for this.

hetz0/be0-mail@2015-03-10 2.88G 387G
hetz0/be0-mail@2015-03-11 1.12G 388G
hetz0/be0-mail@2015-03-12 1.11G 388G
hetz0/be0-mail@2015-03-13 1.19G 388G

Hugo pointed out on the BTRFS mailing list that the following command will give the amount of space used for snapshots. $SNAPSHOT is the name of a snapshot and $LASTGEN is the generation number of the previous snapshot you want to compare with.

btrfs subvolume find-new $SNAPSHOT $LASTGEN | awk '{total = total + $7}END{print total}'

One upside of the BTRFS implementation in this regard is that the above btrfs command without being piped through awk shows you the names of files that are being written and the amounts of data written to them. Through casually examining this output I discovered that the most written files in my home directory were under the “.cache” directory (which wasn’t exactly a surprise).

Now I am configuring workstations with a separate subvolume for ~/.cache for the main user. This means that ~/.cache changes don’t get stored in the hourly snapshots and less disk space is used for snapshots.


My observation is that things are going quite well with BTRFS. It’s more than 6 months since I had a noteworthy problem which is pretty good for a filesystem that’s still under active development. But there are still many systems I run which could benefit from the data integrity features of ZFS and BTRFS that don’t have the resources to run ZFS and need more reliability than I can expect from an unattended BTRFS system.

At this time the only servers I run with BTRFS are located within a reasonable drive from my home (not the servers in Germany and the US) and are easily accessible (not the embedded systems). ZFS is working well for some of the servers in Germany. Eventually I’ll probably run ZFS on all the hosted servers in Germany and the US, I expect that will happen before I’m comfortable running BTRFS on such systems. For the embedded systems I will just take the risk of data loss/corruption for the next few years.

June 18, 2015

Philippines is ready, set, go with CAP on a Map

The Philippines Atmospheric, Geophysical, and Astronomical Services Administration (PAGASA), Philippines Institute of Volcanology and Seismology (PHILVOLCS), and the National Disaster Risk Reduction and Management Council (NDRRMC) are three agencies of foremost importance. Combined they are responsible for the monitoring, detecting, [Read the Rest...]

June 15, 2015

SahanaCamp Turkey

Turkey recently hosted the latest SahanaCamp, that magical blend of humanitarians and techie folks coming together to work on solving information management problems. Elvan Cantekin, General Manager at the MAG Foundation has been working on this for a couple of years and [Read the Rest...]

June 11, 2015

Technology Preview: CoreOS Linux and xhyve

Yesterday a new lightweight hypervisor for OS X was released called xhyve; if you are familiar with qemu-kvm on Linux, it provides a roughly similar experience. In this post we are going to show how to run CoreOS Linux under xhyve. While this is all very early and potentially buggy tech, we want to give you some tips on how to try CoreOS Linux with xhyve and run Docker or rkt on top.

xyhve is a port of bhyve, the FreeBSD hypervisor, to OS X. It is designed to run off-the-shelf Linux distros. We’ve made it possible to run it on CoreOS Linux so you can get the benefits of a lightweight Linux OS running under a lightweight hypervisor on Macs. It is now possible to launch a full local development or testing environment with just a few shell commands.

A few ideas we are thinking about:

  • Single command to launch CoreOS Linux images.
  • Easily launch a Kubernetes cluster right on your laptop.
  • An OS X native version of rkt that can run Linux applications inside xhyve.

Keep in mind that xhyve is a very new project, so much work still needs to be done. You must be running OS X Yosemite for this to work. Check out this page for step-by-step instructions on how to try it out.

A Quick Example

Currently, you need to build xhyve yourself:

$ git clone
$ cd xhyve
$ make
$ sudo cp build/xhyve /usr/local/bin/

Now we can install the initial CoreOS tooling for xhyve:

$ git clone
$ cd coreos-xhyve
$ ./coreos-xhyve-fetch
$ sudo ./coreos-xhyve-run

Type ip a in the console of the virtual machine to get its IP address.

Let’s run a simple Docker container:

$ docker -H<ip-of-virtual-machine>:2375 run -it --rm busybox

Please open issues with ideas for enhancements or use cases. We welcome contributions to the code, so please open a pull request if you have code to share.

June 09, 2015

etcd2 in the CoreOS Linux Stable channel

This week marks a particularly special milestone for etcd2. Beginning today, etcd2 will be available in the CoreOS Linux Stable channel. This means that everyone will now be able to take advantage of etcd2, which we launched earlier this year.

etcd is an open source, distributed, consistent key-value store. It is a core component of CoreOS software that helps to facilitate safe automatic updates, coordinate work between hosts, and manage overlay networking for containers. To recap, new features and improvements in etcd2 include:

  • Reconfiguration protocol improvements, enabling more safeguards against accidental misconfiguration
  • A new raft implementation, providing improved cluster stability and predictability in massive server environments
  • On-disk safety improvements, in which CRC checksums and append-only log behavior allow etcd to detect external data corruption and avoid internal file misoperations

More details can be found in this post which first introduced etcd2. Give it a shot and let us know what you think!

A special thank-you to all of the contributors who made this possible. Join us in the continued development of etcd through the etcd-dev discussion mailing list, GitHub issues, or contributing directly to the project.

June 03, 2015

Building and deploying minimal containers on Kubernetes with and wercker

Today's guest post has been written by Micha "mies" Hernandez van Leuffen, the founder and CEO of wercker, a platform and tool for building, testing and deploying in the modern world of microservices, containers and clouds.

Edit: Added video to end of post. Skip to video

The landscape of production has changed: monolithic is out, loosely coupled microservices are in. Modern applications consist of multiple moving parts, but most of the existing developer tooling we use was designed and built in the world of monolithic applications.

Working with microservices poses new challenges: your applications now consist of multiple processes, multiple configurations, multiple environments and more than one codebase.

Containers offer a way to isolate and package your applications along with their dependencies. Docker and rkt are popular container runtimes and allow for a simplified deployment model for your microservices. Wercker is a platform and command line tool built on Docker that enables developers to develop, test, build and deliver their applications in a containerized world. Each build artifact from a pipeline is a container, which gives you an immutable testable object linked to a commit.

In this tutorial, we will build and launch a containerized application on top of Kubernetes. Kubernetes is a cluster orchestration framework started by Google, specifically aimed at running container workloads. We will use from CoreOS for our container registry and wercker (of course!) to build the container and trigger deploys to Kubernetes.

The workflow we will create is depicted below:

werker pipeline

Workflow from build to deployment.


This tutorial assumes you have the following set up:

  • A wercker account. You can sign up for free here.
  • An account on
  • A Kubernetes cluster. See the getting started section to set one up.
  • A fork of the application we will be building which you can find on GitHub.
  • You've added the above application to wercker and are using the Docker stack to build it.

Getting started

The application we will be developing is a simple API with one endpoint, which returns an array of cities in JSON. You can check out the source code for the API on GitHub. The web process listens on port 5000; we'll need this information later on.

Now, let's create our Kubernetes service configuration and include it into our repository.

   "kind": "Service",
   "apiVersion": "v1beta3",
   "metadata": {
      "name": "cities",
      "labels": {
         "name": "cities"
      "createExternalLoadBalancer": true,
      "ports": [
           "port": 5000,
           "targetPort": "http-server",
           "protocol": "TCP"

We define the port that our application is listening on and use the public IP addresses that we got upon creating our Kubernetes cluster. We're using Google Container Engine, which allows for createExternalLoadBalancer. If you're using a platform which doesn't support createExternalLoadBalancer then you need to add the public IP addresses of the nodes to the publicIPs property.

Next, we're going to define our pipeline, which describes how wercker will build and deploy your application.

wercker.yml - build pipeline

On wercker, you structure your pipelines in a file called wercker.yml. It’s where you define the actions (steps) and environment for your tasks (tests, builds, deploys). Pipelines can either pass or fail, depending on the results of the steps within. Steps come in three varieties; steps from the wercker step registry, inline script steps and internal steps that run with extra privileges.

Pipelines also come with environment variables, some of which are set by default, others you can define yourself. Each pipeline can have its own base container (the main language environment of your application) and any number of services (databases, queues).

Now, let's have a look at our build pipeline for the application. You can check out the entire wercker.yml on GitHub.

    box: google/golang

    # Test the project
    - script:
        name: go test
        code: go test ./...

    # Statically build the project
    - script:
        name: go build
        code: CGO_ENABLED=0 go build -a -ldflags '-s' -installsuffix cgo -o app .

    # Create cities-controller.json only for initialization
    - script:
        name: create cities-controller.json
        code: ./

    # Copy binary to a location that gets passed along to the deploy pipeline
    - script:
        name: copy binary
        code: cp app cities-service.json cities-controller.json "$WERCKER_OUTPUT_DIR"

The box is the container and environment in which the build runs. Here we see that we're using the google/golang image as a base container for our build as it has the golang language and build tools installed in it. We also have a small unit test inside of our code base which we run first. Next we compile our code and build the app executable.

As we want to build a minimal container, we will statically compile our application. We disable the ability to create Go packages that call C code with the CGO_ENABLED=0 flag, rebuild all dependencies with the -a flag, and finally we remove any debug information with the -ldflags flag, resulting in an even smaller binary.

Next, we create our Kubernetes replication controller programmatically based on the git commit using a shell script. You can check out the shell script on GitHub.

The last step copies the executable and Kubernetes service definitions into the $WERCKER_OUTPUT_DIR folder, and the contents of this folder gets passed along to the /pipeline/source/ folder within the deploy pipeline.

wercker.yml - push to

We're now ready to set up our deploy pipelines and targets. We will create two deploy targets. The first will push our container to, the second will perform the rolling update to Kubernetes. Deploy targets are created in the wercker web interface and reference the corresponding section in the wercker.yml.

werker pipeline

Deploy targets in werker.

In order to add any information such as usernames, passwords, or tokens that our deploy target might need, we define these as environment variables for each target. These environment variables will be injected when a pipeline is executed. is a public and private registry for Docker image repositories. We will be using to host the container image that is built from wercker.

    box: google/golang
     # Use the scratch step to build a container from scratch based on the files present
    - internal/docker-scratch-push:
        username: $QUAY_USERNAME
        password: $QUAY_PASSWORD
        cmd: ./app
        tag: $WERCKER_GIT_COMMIT
        ports: "5000"

The deploy section of our wercker.yml above consists of a single step. We use the internal/docker-scratch-push step to create a minimal container based on the files present in the $WERCKER_ROOT environment variable (which contains our binary and source code) from the build, and push it to The $QUAY_USERNAME and $QUAY_PASSWORD parameters are environment variables that we have entered on the wercker web interface. For the tag, we use the git commit hash, so each container is versioned. This hash is available as an environment variable from within the wercker pipeline.

The cmd parameter is the command that we want to run on start-up of the container, which in our case is our application that we've built. We also need to define the port on which our application will be available, which should be the same port as in our Kubernetes service definition. Finally, we fill in the details of our repository and the URL of the registry.

If you take a look at your dashboard you will see that the final container that was pushed is just 1.2MB!

wercker.yml - Kubernetes rolling update

For this tutorial, we assume you've already created a service with an accompanying replication controller. If not, you can do this via wercker as well. See the initialize section in the wercker.yml

Let's proceed to do the rolling update on Kubernetes, replacing our pods one-by-one.

    - kubectl:
        server: $KUBERNETES_MASTER
        username: $KUBERNETES_USERNAME
        password: $KUBERNETES_PASSWORD
        insecure-skip-tls-verify: true
        command: rolling-update cities

The environment variables are again defined in the wercker web interface. The $KUBERNETES_MASTER environment variable contains the IP address of our instance.

werker pipeline

Kubernetes credentials defined in the pipeline.

We execute the rolling update command and tell Kubernetes to use our Docker container from with the image parameter. The tag we use for the container is the git commit hash.


In this tutorial, we have showcased how to build minimal containers and use wercker as our assembly line. Our final container was just 1.2MB, making for low-cost deploys!

Though the go programming language compiles to single binaries, making our life easier, our learnings can be applied to other programming languages as well.

Using wercker's automated build process we've not only created a minimal container, but also linked our artifact versioning to git commits in

Pairing our versioned containers with Kubernetes' orchestration capabilities results in a radically simplified deployment process, especially with the power of rolling updates!

In short, the combination of Kubernetes, and wercker is a powerful and disruptive way of building and deploying modern-day applications.

In this article we've just scratched the surface of developing container-based microservices. To learn more about Kubernetes check out the getting started guides. For more information on, see the documentation site. You can sign up for wercker here and more information and documentation is available at our dev center. The source code for our final application including its wercker.yml is available on GitHub.

June 02, 2015

Oh, the places we’ll be in June

We’re across the US and in the Netherlands this month. Check out where we’re speaking!

Couchbase Connect: Thursday, June 4 at 1:45 p.m. PDT – Santa Clara, CA

Brian Harrington (@brianredbeard), also known as Redbeard, principal architect at CoreOS, will be at Couchbase Connect and will join Traun Leyden from Couchbase to discuss Tectonic, provide a deep dive on the technology behind Kubernetes, and walk through the steps required to get Couchbase running on Kubernetes.

HP Discover: Thursday, June 4 at 3:30 p.m. PDT – Las Vegas, NV

At HP Discover in Las Vegas this week? Brandon Philips (@brandonphilips), CTO of CoreOS, Janne Heino of Nokia and Chris Grzegorczyk (@grze), chief architect at HP, will speak on Thursday, June 4 at 3:30 p.m. at Discover Theater 1 about Hybrid cloud and containers for modern application architectures. Join Nokia to walk through its global private cloud deployment of Helion Eucalyptus that also uses CoreOS’s container runtime, rkt.

ContainerDays Boston: Friday, June 5 at 3:40 p.m. EDT – Boston, MA

Barak Michener (@barakmich), software engineer and CoreOS developer advocate, will be at ContainerDays Boston and will discuss CoreOS: Building the Layers of the Cluster. Barak will also join Dave Nielsen (@davenielsen) from CloudCamp on Saturday at 12:55 p.m. EDT for a workshop that will help you get started with deploying your first container to CoreOS, Cloud Foundry, Azure and AWS.

QCon: Monday, June 8 at 9 a.m. EDT – New York, NY

Join Kelsey Hightower (@kelseyhightower), product manager, developer and chief advocate at CoreOS, for an in-depth, day-long tutorial at QCon New York on Kubernetes and CoreOS.

Cloud Expo East: Tuesday, June 9 at 1:55 p.m. EDT – New York, NY

Meet Jake Moshenko, product technical lead at CoreOS, who will speak at Cloud Expo East about Containers: New Ways to Deploy and Manage Applications at Scale.

Nutanix .NEXT: Tuesday, June 9-Wednesday, June 10 – Miami, FL

Kelsey Hightower (@kelseyhightower) and Alex Polvi (@polvi), CEO of CoreOS, will present at Nutanix .NEXT, the company’s first user conference. See Kelsey speak on Tuesday, June 9 at 3:30 p.m. EDT on Containers—What They Mean for the Future of Application Deployment. Alex will join Alan Cohen, chief commercial officer at Illumio, Dheeraj Pandey (@trailsfootmarks), CEO of Nutanix, and JR Rivers (@JRCumulus), CEO of Cumulus Networks, in the closing keynote panel: The New Enterprise IT Stack. Don’t miss it on Wednesday, June 10 at 12:15 p.m. EDT.

GoSV Meetup: Tuesday, June 9 at 6:30 p.m. PDT – San Mateo, CA

The CoreOS team will be talking with the Go Silicon Valley Meetup group this month in San Mateo at Collective Health. Register here.

NYLUG Meetup: Wednesday, June 17 at 6:30 p.m. EDT – New York, NY

The CoreOS New York team will be at the New York Linux Users Group (NYLUG) and will provide an overview of CoreOS. Sign-ups begin on June 3. Register to attend here.

GoSF Meetup: Wednesday, June 17 at 6:30 p.m. PDT – San Francisco, CA

See the CoreOS team at the GoSF Meetup to listen in on a talk about A Survey of RPC options in Go.

GOTO Amsterdam: Friday, June 19 – Amsterdam, The Netherlands

Kelsey Hightower (@kelseyhightower) will be at GOTO Amsterdam speaking on rkt and the App Container spec at 11:30 a.m. CEST and will join a panel at 3:50 p.m. CEST to discuss Docker predictions.

Pre-DockerCon panel: Sunday, June 21 – San Francisco, CA

Join Kelsey Hightower (@kelseyhightower) and other thought leaders that will be at DockerCon for a pre-event evening panel on conducting systems and services: an evening about orchestration.

DevOpsDays Amsterdam: Wednesday, June 24 – Amsterdam, The Netherlands

Learn about CoreOS at a DevOpsDays Amsterdam workshop presented by Chris Kühl (@blixtra) on June 24.

In case you missed it, check out the recordings of the CoreOS Fest talks that were held last month. More will be posted this month so stay tuned.

May 19, 2015

Dagstuhl Seminar: Compositional Verification Methods for Next-Generation Concurrency

Some time ago, I figured out that there are more than a billion instances of the Linux kernel in use, and this in turn led to the realization that a million-year RCU bug is happening about three times a day across the installed base. This realization has caused me to focus more heavily on RCU validation, which has uncovered a number of interesting bugs. I have also dabbled a bit in formal verification, which has not yet found a bug. However, formal verification might be getting there, and might some day be a useful addition to RCU's regression testing. I was therefore quite happy to be invited to this Dagstuhl Seminar. In what follows, I summarize a few of the presentation. See here for the rest of the presentations.

Viktor Vafeiadis presented his analysis of the C11 memory model, including some “interesting” consequences of data races, where a data race is defined as a situation involving multiple concurrent accesses to a non-atomic variable, at least one of which is a write. One such consequence involves a theoretically desirable “strengthening” property. For example, this property would mean that multiplexing two threads onto a single underlying thread would not introduce new behaviors. However, with C11, the undefined-behavior consequences of data races can actually cause new behaviors to appear with fewer threads, for example, see Slide 7. This suggests the option of doing away with the undefined behavior, which is exactly the option that LLVM has taken. However, this approach requires some care, as can be seen on Slide 19. Nevertheless, this approach seems promising. One important takeaway from this talk is that if you are worried about weak ordering, you need to pay careful attention to reining in the compiler's optimizations. If you are unconvinced, take a look at this! Jean Pichon-Pharabod, Kyndylan Nienhuis, and Mike Dodds presented on other aspects of the C11 memory model.

Martin T. Vechev apparently felt that the C11 memory model was too tame, and therefore focused on event-driven applications, specifically javascript running on Android. This presentation included some entertaining concurrency bugs and their effects on the browser's display. Martin also discussed formalizing javascript's memory model.

Hongjin Liang showed that ticket locks can provide starvation freedom given a minimally fair scheduler. This provides a proof point for Björn B. Brandenburg's dissertation, which analyzed the larger question of real-time response from lock-based code. It should also provide a helpful corrective to people who still believe that non-blocking synchronization is required.

Joseph Tassarotti presented a formal proof of the quiescent-state based reclamation (QSBR) variant of userspace RCU. In contrast to previous proofs, this proof did not rely on sequential consistency, but instead leveraged a release-acquire memory model. It is of course good to see researchers focusing their tools on RCU! That said, when a researcher asked me privately whether I felt that the proof incorporated realistic assumptions, I of course could not resist saying that since they didn't find any bugs, the assumptions clearly must have been unrealistic.

My first presentation covered what would be needed for me to be able to use formal verification as part of Linux-kernel RCU's regression testing. As shown on slide 34, these are:

  1. Either automatic translation or no translation required. After all, if I attempt to manually translate Linux-kernel RCU to some special-purpose language every release, human error will make its presence known.

  2. Correctly handle environment, including the memory model, which in turn includes compiler optimizations.

  3. Reasonable CPU and memory overhead. If these overheads are excessive, RCU is better served by simple stress testing.

  4. Map to source code lines containing the bug. After all, I already know that there are bugs—I need to know where they are.

  5. Modest input outside of source code under test. The sad fact is that a full specification of RCU would be at least as large as the implementation, and also at least as buggy.

  6. Find relevant bugs. To see why this is important, imagine that some tool finds 100 different million-year bugs and I fix them all. Because roughly one of six fixes introduces a bug, and because that bug is likely to reproduce in far less than a million years, this process has likely greatly reduced the robustness of the Linux kernel.

I was not surprised to get some “frank and honest” feedback, but I was quite surprised (but not at all displeased) to learn that some of the feedback was of the form “we want to see more C code.” After some discussion, I provided just that.

CoreOS Linux is in the OpenStack App Marketplace

Today at the OpenStack Summit in Vancouver, we are pleased to announce that CoreOS Linux – the lightweight operating system that provides stable, reliable updates to all machines connected to the update service – is included in the OpenStack Community App Catalog.

CoreOS Linux is now available in the Community App Catalog alongside ActiveState Stackato, Apcera, Cloud Foundry, Kubernetes, MySQL, Oracle Database 12c and Oracle Multitenant, Postgres, Project Atomic, Rally, Redis, Tomcat and Wordpress. The Community App Catalog is where community members can share apps and tools designed to integrate with OpenStack Clouds.

With the ability to use CoreOS directly from the catalog, it will be easier to use CoreOS Linux on OpenStack. CoreOS Linux delivers automatic updates that are critical to keeping a system secure. CoreOS Linux’s continuous stream of updates minimizes the complexity of an update and engineering teams have the flexibility to select specific release channels to deploy and control how clusters apply updates. Get started with CoreOS on OpenStack here.

At the intersection of open source technologies, we are excited to continue helping users succeed with containers in the OpenStack ecosystem. If you are at the OpenStack Summit this week, stop by to meet us and see our talk today at 2 p.m., Dream Stack, CoreOS + OpenStack + Kubernetes.

May 18, 2015

CoreOS at OpenStack Summit 2015

CoreOS is in Vancouver this week. Not only are we excited to see where OpenStack is taking containers; we’re also pumped for 24-hour poutine.

There are CoreOS-focused events on the first three days of the conference! Speakers on Monday and Tuesday, plus a deep dive into CoreOS all afternoon on Wednesday.

Monday, May 18, 2015 at 2:00 p.m.

Don’t miss our very own Matthew Garrett talking about how we can secure cloud infrastructure using TPMs today at 2 p.m.

Tuesday, May 19, 2015 at 2:00 p.m.

Next up on Tuesday, May 19 at 2 p.m., we have Brian “Redbeard” Harrington from CoreOS telling us all about the Dream Stack, CoreOS + OpenStack + Kubernetes.

Wednesday, May 20, 2015 at 1:50-6:00 p.m.

To dive deeper into CoreOS, our Collaboration Day event is on Wednesday at 1:50-6 p.m. Brian Harrington and Brian Waldon will be there to answer all of your CoreOS questions. Here is the schedule:

Time Topic
1:50 - 2:30 CoreOS as a building block for OpenStack Ironic
2:40 - 3:20 Managing CoreOS Images effectively with Glance (Dos and Don'ts)
3:30 - 4:10 CoreOS Developer AMA (Ask Me Anything)
4:10 - 4:30 20 minute break
4:30 - 5:10 Administrative/Firmware containers - going beyond your web applications
5:20 - 6:00 Building minimal application containers from scratch

Be sure to stop by our 3D-printing booth right near registration! That’s right. We’ve teamed up with Codame to immortalize your time here at OpenStack Summit. Try it alone, or bring a friend. You’re not going to want to miss this!

Meet our team and tweet to us @CoreOSLinux!

CoreOS at OpenStack Summit

May 14, 2015

New Functional Testing in etcd

Today we are discussing the new fault-injecting, functional testing framework built to test etcd, which can deploy a cluster, inject failures, and check the cluster for correctness continuously.

For context, etcd is an open source, distributed, consistent key-value store. It is a core component of CoreOS software that facilitates safe automatic updates, coordinates work scheduled to hosts, and sets up overlay networking for containers. Because of its core position in the stack, its correctness and availability is significantly critical, which is why the etcd team has built the functional testing framework.

Since writing the framework, we have run it continuously for the last two months, and etcd has shown to be robust under many kinds of harsh failure scenarios. This framework has also helped us identify a few potential bugs and improvements that we’ve fixed in newer releases — read on for more info.

Functional Testing

etcd’s functional test suite tests the functionality of an etcd cluster with a focus on failure-resistance under heavy usage.

The main workflow of the functional test suite is straightforward:

  1. It sets up a new etcd cluster and injects a failure into the cluster. A failure is some unexpected situation that may happen in the cluster, e.g., machine doesn’t work or network is down.
  2. It repairs the failure and expects the etcd cluster to recover within a short amount of time (usually one minute).
  3. It waits for the etcd cluster to be fully consistent and making progress.
  4. It starts the next round of failure injection.

Meanwhile, the framework makes continuous write requests to the etcd cluster to simulate heavy workloads. As a result, there are constantly hundreds of write requests queued, intentionally causing a heavy burden on the etcd cluster.

If the running cluster cannot recover from failure, the functional testing framework archives the cluster state and does the next round of testing on a new etcd cluster. When archiving, process logs and data directories for each etcd member are saved into a separate directory, which can be viewed and debugged in the future.

Basic Architecture

etcd's functional test suite has two components: etcd-agent and etcd-tester. etcd-agent runs on every etcd node and etcd-tester is a single controller of the test.

etcd-agent is a daemon on each machine. It can start, stop, restart, isolate and terminate an etcd process. The agent exposes these functionalities via RPC.

etcd-tester utilizes all etcd-agents to control the cluster and simulate various test cases. For example, it starts a three-member cluster by sending three start-RPC calls to three different etcd-agents. It then forces one of the members to fail by sending a stop-RPC call to the member’s etcd-agent.

etcd functional testing

While etcd-tester uses etcd-agent to control etcd externally, it also directly connects to etcd members to make simulated HTTP requests, including setting a range of keys and checking member health.

Internal Testing Suite

The internal functional testing suite case is built upon four n1-highcpu-2 virtual machines on Google Compute Engine. Each machine has 2 virtual cores, 1.8G memory and 200G standard persistent disk. Three machines have etcd-agent running as a daemon, while the fourth machine runs etcd-tester as the controller.

Currently we have six major failures to simulate the most common cases that etcd may meet in real life:

  1. kill all members
    • the whole data center experiences an outage, and the etcd cluster in the data center is killed
  2. kill the majority of the cluster
    • part of the data center experiences an outage, and the etcd cluster loses quorum
  3. kill one member
    • a single machine needs to be upgraded or maintained
  4. kill one member for a significant time and expect it to recover from an incoming snapshot
    • a single machine is down due to hardware failure, and requires manual repair
  5. isolate one member
    • the network interface on a single machine is broken
  6. isolate all members
    • the router or switch in the data center is broken

Meanwhile, 250k 100-byte keys are written into the etcd cluster continuously, which means we’re storing about 25MB of data in the cluster.

Discovering Potential Bugs

This test suite has helped us to discover potential bugs and areas to improve. In one discovery, we found that when a leader is helping the follower catch up with the progress of the cluster, there was a slight possibility that memory and CPU usage could explode without bound. After digging into the log, it turned out that the leader was repeatedly sending 50MB-size snapshot messages and overloaded its transport module. To fix the issue, we designed a message flow control for snapshot messages that solved the resource explosion.

Another example is the automatic WAL repair feature added in 2.1.0. To protect data integrity, etcd intentionally refuses to restart if the last entry in the underlying WAL was half-written, which may happen if the process is killed or disk is full. We've found this happens occasionally (once per hundred rounds) in functional testing, and it’s safe and easier to remove the error automatically and recover from the cluster to simplify the recovery for the administrator. This functionality has been merged into the master branch, and will be released in v2.1.0.

After several weeks of running and debugging, the etcd cluster has survived several thousand consecutive rounds of all six failures. Surviving serious testing, the etcd cluster is strong and working quite well.

Diving into the Code

Build and Run

etcd-agent can be built via

$ go build

and etcd-tester at

$ go build

Run etcd-agent binary on machine{1,2,3}:

$ ./etcd-agent --etcd-path=$ETCD_BIN_PATH

Run etcd-tester binary on machine4:

$ ./etcd-tester -agent-endpoints=”$MACHINE1_IP:9027,$MACHINE2_IP:9027,$MACHINE3_IP:9027” -limit=3 -stress-key-count=250000 -stress-key-size=100

etcd-tester starts running, and makes 3 rounds of all failures on a 3-member cluster in machines 1, 2, and 3.

Add a new failure

Let us go through the process to add failureKillOne, which kills one member and recovers it afterwards. First, write how to inject and recover from failure:

type failureKillOne struct {

func newFailureKillOne() *failureKillOne {
  return &failureKillOne{
    // detailed description of the failure
    description: "kill one member",

func (f *failureKillOne) Inject(c *cluster, round int) error {
  // round robin on all members
  i := round % c.Size
  // ask its agent to stop etcd
  return c.Agents[i].Stop()

func (f *failureKillOne) Recover(c *cluster, round int) error {
  i := round % c.Size
  // ask its agent to restart etcd
  if _, err := c.Agents[i].Restart(); err != nil {
    return err
  // wait for recovery done
  return c.WaitHealth()

Then we add it into failure lists:

  t := &tester{
    failures: []failure{
    cluster: c,
    limit:   *limit,


As you see, the framework is simple but already fairly powerful. We are looking forward to having you join the etcd test party!

Future Plans

The framework is still under active development, and more failure cases and checks will be added.

Random network partitions, network delays and runtime reconfigurations are some classic failure cases that the framework does not yet cover. Another interesting idea we plan to explore is a cascading failure case that injects multiple failure cases at the same time.

On the recovery side, more checks against consistent views of the keyspace on all members is a good starting point for more exploration.

The internal testing cluster runs 24/7, and our etcd cluster works perfectly under the current failure set. The etcd team is making its best effort to guarantee etcd’s correctness, and hopes that we can provide users the most robust consensus store possible.

Follow-up plans for more specific and harsher tests are in our TODO list. This framework is good to imitate real-life scenarios, but it cannot have fine controls on lower-level system and hardware behaviors. Future testing approaches may use simulated networks and disks to tackle these failure simulations.

We will keep enhancing the testing strength and coverage by adding more failure cases and checks into the framework. Pull requests to the framework are welcomed!


We are running our testing cluster on GCE. Thanks to Google for providing the testing environment.

May 13, 2015

Upcoming CoreOS Events in May

We kicked off May by hosting our first ever CoreOS Fest, and it was a blast! We’re sad to see it go, but we’re excited about all of the other events we’ll be speaking at and attending this month.

Wednesday, May 13, 2015 at 2:00 p.m. EDT

What could be better than listening to Kelsey Hightower give a talk! Listen in from anywhere in the world to hear Kelsey discuss how to get started with containers and microservices during the Logentries Webinar. Register now!

Wednesday, May 13, 2015 at 6:00 p.m. PDT - San Francisco, CA

Alex Crawford from CoreOS will be giving an overview of CoreOS at the SF DevOps Meetup group. Thanks to Teespring for hosting the event at its SOMA office. Be sure not to miss it!

Tuesday, May 19, 2015 at 2:00 p.m. PDT - Vancouver, BC Canada

If you find yourself at OpenStack Summit Vancouver, be sure to check out Brian ‘Redbeard’ Harrington talk about modern practices for building a private cloud that runs containers at scale. We’ll also have our team there, so please stop by our area and meet us. We even have a Collaboration Day session for attendees on Wednesday, May 20 from 1:50 p.m. to 6 p.m.

Wednesday, May 20, 2015 at 9:10 a.m. EDT - Seven Springs, PA

CoreOS CEO Alex Polvi will be keynoting WHD.usa this year by talking about building distributed systems and securing the backend of the internet.

Wednesday, May 20, 2015 at 6:30 p.m. EDT - Atlanta, GA

Join Brian Akins from CoreOS in Atlanta at the DevOps ATL Meetup, where he’ll be discussing new ways to deploy and manage applications at scale. Thanks to MailChimp for hosting this meetup at their Ponce City Market office.

Wednesday, May 20, 2015 at 11:05 a.m. MDT - Denver, CO

Don’t miss Kelsey Hightower at GlueCon 2015 where he’ll give an overview of key technologies at CoreOS and how you can use these new technologies to build performant, reliable, large distributed systems.

Thursday, May 21 2015 at 11:20 a.m. MDT - Denver, CO

CoreOS CTO Brandon Philips will be at GlueCon 2015 discussing how to create a Google-like infrastructure. It will cover everything you need to know from the OS to the scheduler.

Thursday, May 21 2015 at 2:40 p.m. EDT - Charleston, SC

You can find Kelsey Hightower at CodeShow SE 2015 explaining how to manage containers at scale with CoreOS and Kubernetes.

Thursday, May 21, 2015 at 7:30 p.m. CEST - Madrid, Spain

Iago Lopez Galeiras will be joining the Madrid DevOps Meetup this month to give a talk on rkt and the App Container spec.

Tuesday, May 26, 2015 at 6:00 p.m. EDT - Charlottesville, VA

Don’t miss Brian Akins from CoreOS give an introduction to building large reliable systems at the DevOps Charlottesville Meetup group.

Friday, May 29, 2015 at 11:50 a.m. PDT - Santa Clara, CA

Be sure to check out Kelsey Hightower at Velocity where his talk will examine all the major components of CoreOS including etcd, fleet, docker, and systemd; and how these components work together.

CoreOS Fest Recap

Check out some of the best moments from CoreOS Fest 2015!

Join us at an event in your area! If you would like our help putting together a CoreOS meetup, or would like to speak at one of our upcoming meetups, please contact us at

May 05, 2015

CoreOS State of the Union at CoreOS Fest

At CoreOS Fest we have much to celebrate with the open source community. Today over 800 people contribute to CoreOS projects and we want to thank all of you for being a part of our community.

We want to take this opportunity to reflect on where we started from with CoreOS Linux. Below, we go into depth about each project, but first, a few highlights:

  • We've now shipped CoreOS Linux images for nearly 674 days, since the CoreOS epoch on July 1, 2013.
  • We've rolled out 13 major releases of the Linux kernel from 3.8.0, released in February 2013, to the 4.0 release in April 2015.
  • In that time, we have tagged 329 releases of our images.
  • We have 500+ projects on GitHub that mention etcd, including major projects like Kubernetes, using etcd.

CoreOS Linux

Our namesake project, CoreOS Linux, started with the idea of continuous delivery of a Linux operating system. Best practice in the industry is to ship applications regularly to get the latest security fixes and newest features to users – we think an operating system can be shipped in a similar way. And for nearly two years, since the CoreOS epoch on July 1, 2013, we have been shipping regular updates to CoreOS Linux machines.

In a way, CoreOS Linux is a kernel delivery system. The alpha channel has rolled through 13 major releases of the Linux kernel from 3.8.0 in February 2013 to the recent 4.0 release in April 2015. This doesn’t include all of the minor patch releases we have bumped through as well. In that time we have tagged 329 releases of our images. To achieve this goal, CoreOS uses a transactional system so upgrades can happen automatically.

CoreOS Linux stats and community

CoreOS Linux stats shared at CoreOS Fest

Community feedback has been incredibly important throughout this journey: users help us track down bugs in upstream projects like the Linux kernel, give us feedback on new features, and flag regressions that are missed by our testing.

A wide variety of companies are building their products and infrastructure on top of CoreOS Linux, including many participants at CoreOS Fest:

Deis, a project recently acquired by Engine Yard, spoke yesterday on "Lessons Learned From Building Platforms on Top of CoreOS" Mesosphere DCOS uses CoreOS by default, and we are happy to have them sponsor CoreOS Fest Salesforce spoke today on how they are using distributed systems and application containers Coinbase presented a talk today on "Container Management & Analytics"


We build CoreOS Linux with just a single-host use case in mind, but wanted people to trust and use CoreOS to update their entire fleet of machines. To solve this problem of automated yet controlled updates across a distributed set of systems, we built etcd.

etcd was initially created to provide an API-driven distributed "reboot lock" to a cluster of hosts, and it has been very successful serving this basic purpose. But over the last two years, adoption and usage of etcd has been exploded: today it is being used as a key part of projects like Google's Kubernetes, Cloud Foundry's Diego, Mailgun's Vulcan and many more custom service discovery and master election systems.

At CoreOS Fest we have seen demonstrations of a PostgreSQL master election system built by, a MySQL master election system built by HP, and a discussion by Yodlr about how they use it for their internal microservice infrastructure. With feedback from all of these users of etcd, we are planning an advanced V3 API, a next-generation disk-backed store and writing new punishing long-running tests to ensure etcd remains a highly reliable component of distributed infrastructure.

CoreOS' etcd stats and community

etcd stats shared at CoreOS Fest

fleet on top of etcd

After etcd, we built fleet, a scheduler system that ties together systemd and etcd into a distributed init system. fleet can be thought of as a logical extension of systemd that operates at the cluster level instead of the machine level.

The fleet project is low level and designed as a foundation for higher order orchestration: its goal is to be a simple and resilient init system for your cluster. It can be used to run containers directly and also as a tool to bootstrap higher-level software like Kubernetes, Mesos, Deis and others.

For more on fleet, see the documentation on launching containers with fleet.

CoreOS' fleet stats and community

fleet stats shared at CoreOS Fest


The youngest CoreOS project is rkt, a container runtime, which was launched in December. rkt has security as a core focus and was designed to fit into the existing Unix process model to integrate well with tools like systemd and Kubernetes. And rkt was also built to support the concept of pods: a container composed of multiple processes that share resources like local network and IPC.

Where is rkt today? At CoreOS fest we discussed how rkt was integrated into Kubernetes, and showed this functionality in a demo yesterday. rkt is also used in Tectonic, our new integrated container platform. Looking forward, we are planning improved UX around trust and image handling tools, advanced networking capabilities, and splitting the stage1 out from rkt to support other isolation mechanisms like KVM.

CoreOS' rkt stats and community

rkt stats shared at CoreOS Fest

Container networking

Containers are most useful when they can interact with other systems over the network. Today in the container ecosystem we have some fairly basic patterns for network configuration, but over time we will need to give users the ability to configure more complex topologies. CNI (Container Network Interface) defines the API between a runtime like rkt and how a container actually joins a network, via an external plugin interface. Our intention with CNI is to develop a generic networking solution supporting a variety of tools, with reusable plugins for different backend technologies like macvlan, ipvlan, Open vSwitch and more.

flannel is another important and useful component in container network environments. In our future work with flannel, we’d like to introduce a flannel server, integrate it into Kubernetes and add generic UDP encapsulation support.

Ignition: Machine Configuration

Ignition is a new utility for configuring machines on first boot. This utility provides similar mechanisms to coreos-cloudinit but will provide the ability to configure a machine before the first boot. By configuring the system early, problems like ordering around network configuration are more easily solved. Just like coreos-cloudinit, Ignition will also have the ability to mark services to start on boot and configure user accounts.

Ignition is still under heavy development, but we are hoping to be able to start shipping it in CoreOS in the next couple of months.


We encourage all of you as users of our systems and to continue having conversations with us. Please share ideas and tell us about what is working well, what may not be working well, and how can continue to have a useful feedback loop. In the GitHub repos for each of these projects, you can find a and which outlines how to get started and where the projects are going. Thank you to our contributors!

We will also have the replays of the talks available at a later date, which will include a demo of Ignition and more

May 04, 2015

App Container spec gains new support as a community-led effort

Today is the inaugural CoreOS Fest, the community event for distributed systems and application containers. We at CoreOS are here to celebrate you – those who want to join us on a journey to secure the backend of the Internet and build distributed systems technologies to bring web scale architecture to any organization. We've come a long way since releasing our first namesake project, CoreOS Linux, in 2013, and as a company we now foster dozens of open source projects as we work together with the community to create the components necessary for this new paradigm in production infrastructure.

An important part of working with this community has been the development of the App Container spec (appc), which provides a definition on how to build and run containerized applications. Announced in December, the appc spec emphasizes application container security execution, portability and modularity. rkt, a container runtime developed by CoreOS, is the first implementation of appc.

As security and portability between stacks becomes central to the successful adoption of application containers, today appc has gained support from various companies in the community:

  • Google has furthered its support of appc by implementing rkt into Kubernetes and joining as a maintainer of appc
  • Apcera has announced an additional appc implementation called Kurma
  • Red Hat has assigned an engineer to participate as a maintainer of appc
  • VMware recently announced how they will contribute to appc and shipped rkt in Project Photon

In order to ensure the specification remains a community-led effort, the appc project has established a governance policy and elected several new community maintainers unaffiliated with CoreOS: initially, Vincent Batts of Red Hat, Tim Hockin of Google and Charles Aylward of Twitter. This new set of maintainers brings each of their own unique points of view and allows appc to be a true collaborative effort. Two of the initial developers of the spec from CoreOS, Brandon Philips and Jonathan Boulle, remain as maintainers, but now are proud to have the collective help of others to make the spec what it is intended to be: open, well-specified and developed by a community.

In the months after the launch of appc, we have seen the adoption and support behind a common application container specification grow quickly. These companies and individuals are coming together to ensure there is a well defined specification for application containers, providing guidelines to ensure security, openness and modularity between stacks.

Google furthers its support of appc by integrating rkt into Kubernetes

Today also marks support for appc arriving in the Kubernetes project, via the integration of rkt as a configurable container runtime for Kubernetes clusters.

"The first implementation of the appc specification into Kubernetes, through the support of CoreOS rkt, is an important milestone for the Kubernetes project," said Craig McLuckie, product manager and Kubernetes co-founder at Google. "Designed with cluster first management in mind, appc support enables developers to use their preferred container image through the same Google infrastructure inspired orchestration framework."

Kubernetes is an open source project introduced by Google to help organizations run their infrastructure in a similar manner to the internal infrastructure that runs Google Search, Gmail and other Google services. Today's announcement of rkt being integrated directly into Kubernetes means that users will have the ability to run ACIs, the image format defined in the App Container spec, and take advantage of rkt’s first-class support for pods. rkt’s native support for running Docker images means they can also continue to use their existing images.

Apcera’s new implementation of appc, Kurma

Also announced today is Kurma, a new implementation of appc by Apcera. Kurma is an execution environment for running applications in containers. Kurma provides a framework that allows containers to be managed and orchestrated beyond itself. Kurma joins a variety of implementations of the appc spec that have emerged in the last six months, such as Jetpack, an App Container runtime for FreeBSD, and libappc, a C++ library for working with containerized applications.

"Apcera has long been invested in secure container technology to power our core platform," said Derek Collison, founder and CEO of Apcera. "We are excited to bring our technology to the open source community and to partner with CoreOS on the future of appc."

Red Hat involvement as a maintainer of appc

Red Hat recently assigned an engineer to participate as a maintainer of appc. Bringing years of experience in container development and leadership in Docker, Kubernetes and the Linux community as a whole, they bring a unique skillset to the effort.

“The adoption of container technology is an exciting trend and one that we believe can have significant customer benefit,” said Matt Hicks, senior director, engineering, Red Hat. “But at the same time, fragmentation of approaches and formats runs the risk of undercutting the momentum. We are excited to be included as maintainers and will work to not only innovate, but also to help create stability for our customers that adopt containers.”

VMware’s continued support of appc

In April, VMware announced support for appc and shipped rkt in Project Photon™, making rkt available to VMware vSphere® and VMware vCloud® Air™ customers. VMware has been an early proponent of appc and is working closely with the community to push forward the spec.

Today VMware reaffirmed their commitment to appc, showing its importance as a community-wide specification.

“VMware supports appc today offering rkt to our customers as a container runtime engine,” said Kit Colbert, vice president and CTO, Cloud-Native Apps, VMware. “We will work with the appc community to address portability and security across platforms – topics that are top of mind for enterprises seeking to support application containers in their IT environments.”

Join the appc community effort

We welcome these new companies into the community and invite others to join the movement to bring forward a secure and portable container standard. Get involved by joining the appc mailing list and discussion on GitHub. We welcome the continued independent implementations of tools to be able to run the same container consistently.

Thank you to all who are coming out to CoreOS Fest. Please follow along with the event on Twitter @CoreOSFest and #CoreOSFest. For those who aren't able to make it in person, the talks will be recorded and available at a later date.

May 01, 2015

Sahana Nepal Earthquake SitRep 3

The Sahana Software Foundation has deployed an instance of the Sahana Open Source Disaster Management Software server to provide a flexible solution for organizations and communities to respond to the Nepal Earthquake: Please contact with questions or to request  support [Read the Rest...]

April 29, 2015

CoreOS Fest 2015 Guide

CoreOS Fest 2015 is in less than a week, and we want to make sure that you’re ready! To ensure that you have everything you need in order to have the best two days, we’ve put together a CoreOS Fest Guide.

If you haven’t gotten a ticket, but plan on joining us, there are only a few remaining tickets so be sure to register now while they are available.

We wouldn’t be here today without the help from our wonderful sponsors. Thank you to Intel, Google, VMware, AWS, Rackspace, Chef, Project Calico, Sysdig, Mesosphere and Giant Swarm.


CoreOS Fest is located at The Village at 969 Market St. (between 5th and 6th St.) in downtown San Francisco, right by the Powell St. BART station. For local parking options, please check for options here.

Badge Pick-Up Times

The registration desk is located on the top floor of The Village. When you walk in, head straight up the stairs to pick up your badge.

Monday, May 4:

8:00 a.m. - end of day

At the registration desk on the top floor

Tuesday, May 5:

8:00 a.m. - end of day

At the registration desk on the top floor

Breakfast and Lunch Details

Breakfast and lunch will be held on the top floor each day. Dietary restrictions? We’ve accommodated for most diets, but if you’re concerned that we won’t have something for your specific diet, we recommend packing a lunch.

Monday, May 4:

Breakfast: 8 a.m. - 9 a.m.

Lunch: 11:45 a.m. - 1:00 p.m.

Tuesday, May 5:

Breakfast: 8:30 a.m. - 9:30 a.m.

Lunch: 11:45 a.m. - 1:00 p.m.

After Party Details

Join us May 4 for our After Party on the top floor of The Village from 5:45 p.m. to 8:00 p.m. We’ll also share a drink and a goodbye on May 5 at 5 p.m. - 6 p.m. at the AWS Pop-Up Loft next door, at 925 Market St.

CoreOS Office Hours

Attendees may sign up for office hours through a link you’ll get in your attendee email. Since there is a limited number of spots, please look at the conference schedule before getting your office hours tickets. Paper office hours tickets will not need to be shown at any time during CoreOS Fest as long as you have your badge.


Have questions or need help the day of the event? You can email us at

A Few Things to Keep in Mind

Be on time

Unlike CS101, this is something you’ll want to wake up for. We promise to have breakfast — and more importantly, coffee — waiting for you.

Talks will be recorded

All talks will be recorded, so if you miss one, don’t worry! All videos will be posted on the Fest ‘15 site after the event.

Bring a bag

Here at CoreOS, we believe that there is such a thing as having too many conference tote bags. If you’ll need a bag, make sure to bring your own, and we'll spare you the bagception dilemma.

Charging and Wi-Fi

Wi-Fi is available at the venue, along with charging stations and outlets.

Come with questions

Some of the most influential developers in infrastructure will be there to tell stories of their successes, missteps and lessons learned. They’re here to answer your questions, so bring on the tough ones!

Follow #CoreOSFest on Twitter

Make sure that you follow @CoreOSFest and #CoreOSFest on Twitter for live schedule updates, recorded talks and news.

We’re only a few days away from CoreOS Fest and we’re excited to see you all there!

Sahana Nepal Earthquake SitRep 2

We have been stepping up our coordination efforts and engaging with folks in Nepal and from around the world who are interested in using Sahana to support the response to this devastating earthquake. Arun Pyasi is currently in Nepal and [Read the Rest...]

April 28, 2015

Slim application containers (using Docker)

Another talk I gave at, was about making slim containers (youtube) –  ones that contain only the barest essentials needed to run an application.

And I thought I’d do it from source, as most “Built from source” images also contain the tools used to build the software.

1. Make the Docker base image you’re going to use to build the software

In January 2015, the main base images and their sizes looked like:

scratch             latest              511136ea3c5a        19 months ago       0 B
busybox             latest              4986bf8c1536        10 days ago         2.433 MB
debian              7.7                 479215127fa7        10 days ago         85.1 MB
ubuntu              15.04               b12dbb6f7084        10 days ago         117.2 MB
centos              centos7             acc1b23376ec        10 days ago         224 MB
fedora              21                  834629358fe2        10 days ago         250.2 MB
crux                3.1                 7a73a3cc03b3        10 days ago         313.5 MB

I’ll pick Debian, as I know it, and it has the fewest restrictions on what contents you’re permitted to redistribute (and because bootstrapping busybox would be an amazing talk on its own).

Because I’m experimenting, I’m starting by seeing how small I can make a new Debian base image –  starting with:

FROM debian:7.7

RUN rm -r /usr/share/doc /usr/share/doc-base \
          /usr/share/man /usr/share/locale /usr/share/zoneinfo

CMD ["/bin/sh"]

Then make a new single layer (squashed image) by running `docker export` and `docker import`

REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
debian              7.7                 479215127fa7        10 days ago         85.1 MB
our/debian:jessie   latest              cba1d00c3dc0        1 seconds ago       46.6 MB

Ok, not quite half, but you get the idea.

Its well worth continuing this exercise using things like `dpkg —get-selections` to remove anything else you won’t need.

Importantly, once you’ve made your smaller base image, you should use it consistently for ALL the containers you use. This means that whenever there are important security fixes, that base image will be downloadable as quickly as possible –  and all your related images can be restarted quickly.

This also means that you do NOT want to squish your images to one or two layers, but rather into some logical set of layers that match your deployment update risks –  a common root base, and then layers based on common infrastructure, and lastly application and customisation layers.

2. Build static binaries –  or not

Building a static binary of your application (in typical `Go` style) makes some things simpler –  but in the end, I’m not really convinced it makes a useful difference.

But in my talk, I did it anyway.

Make a Dockerfile that installs all the tools needed, builds nginx, and then output’s a tar file that is a new build context for another Docker image (and contains the libraries ldd tells us we need):

cat | docker build -t build-nginx.static -
docker run --rm build-nginx.static cat /opt/nginx.tar > nginx.tar
cat nginx.tar | docker import - micronginx
docker run --rm -it -p 80:80 micronginx /opt/nginx/sbin/nginx -g "daemon off;"
nginx: [emerg] getpwnam("nobody") failed (2: No such file or directory)

oh. I need more than just libraries?

3. Use inotify to find out what files nginx actually needs!

Use the same image, but start it with Bash –  use that to install and run inotify, and then use `docker exec` to start nginx:

docker run --rm build-nginx.static bash
$ apt-get install -yq inotify-tools iwatch
# inotifywait -rm /etc /lib /usr/lib /var
Setting up watches.  Beware: since -r was given, this may take a while!
Watches established.
/lib/x86_64-linux-gnu/ CLOSE_NOWRITE,CLOSE
/lib/x86_64-linux-gnu/ CLOSE_NOWRITE,CLOSE
/lib/x86_64-linux-gnu/ CLOSE_NOWRITE,CLOSE
/lib/x86_64-linux-gnu/ CLOSE_NOWRITE,CLOSE
/lib/x86_64-linux-gnu/ CLOSE_NOWRITE,CLOSE
/lib/x86_64-linux-gnu/ CLOSE_NOWRITE,CLOSE
/etc/ OPEN passwd
/etc/ OPEN group
/etc/ ACCESS passwd
/etc/ ACCESS group
/etc/ OPEN localtime
/etc/ ACCESS localtime
/etc/ CLOSE_NOWRITE,CLOSE localtime

Perhaps it shouldn’t be too surprising that nginx expects to rifle through your user password files when it starts :(

4. Generate a new minimal Dockerfile and tar file Docker build context, and pass that to a new `docker build`

The trick is that the build container Dockerfile can generate the minimal Dockerfile and tar context, which can then be used to build a new minimal Docker image.

The excerpt from the Dockerfile that does it looks like:

# Add a Dockerfile to the tar file
RUN echo "FROM busybox" > /Dockerfile \
    && echo "ADD * /" >> /Dockerfile \
    && echo "EXPOSE 80 443" >> /Dockerfile \
    && echo 'CMD ["/opt/nginx/sbin/nginx", "-g", "daemon off;"]' >> /Dockerfile

RUN tar cf /opt/nginx.tar \
           /Dockerfile \
           /opt/nginx \
           /etc/passwd /etc/group /etc/localtime /etc/nsswitch.conf /etc/ \

This tar file can then be passed on using

cat nginx.tar | docker build -t busyboxnginx .


Comparing the sizes, our build container is about 1.4GB, the Official nginx image about 100MB, and our minimal nginx container, 21MB to 24MB –  depending if we add busybox to it or not:

REPOSITORY          TAG            IMAGE ID            CREATED              VIRTUAL SIZE
micronginx          latest         52ec332b65fc        53 seconds ago       21.13 MB
nginxbusybox        latest         80a526b043fd        About a minute ago   23.56 MB
build-nginx.static  latest         4ecdd6aabaee        About a minute ago   1.392 GB
nginx               latest         1822529acbbf        8 days ago           91.75 MB

Its interesting to remember that we rely heavily on `I know this, its a UNIX system` –  application services can have all sorts of hidden assumptions that won’t be revealed without putting them into more constrained environments.

In the same way that we don’t ship the VM / filesystem of our build server, you should not be shipping the container you’re building from source.

This analysis doesn’t try to restrict nginx to only opening certain network ports, devices, or IPC mechanisms – so there’s more to be done…

[Slashdot] [Digg] [Reddit] [] [Facebook] [Technorati] [Google] [StumbleUpon]

April 27, 2015

Announcing GovCloud support on AWS

Today we are happy to announce CoreOS Linux now supports Amazon Web Services GovCloud (US). AWS GovCloud is an isolated AWS Region for US government agencies and customers to move sensitive workloads into the AWS cloud by addressing their specific regulatory and compliance requirements. With this, automatic updates are now stable and available to all government agencies using the cloud.

CoreOS Linux customers will benefit from the security thanks to support of FedRAMP, a US government program sharing a standardized approach to security assessment, authorization and continuous monitoring for cloud products and services.

For more details, see the documentation on Running CoreOS on EC2.

New gst-rpicamsrc features

I’ve pushed some new changes to my Raspberry Pi camera GStreamer wrapper, at

These bring the GStreamer element up to date with new features added to raspivid since I first started the project, such as adding text annotations to the video, support for the 2nd camera on the compute module, intra-refresh and others.

Where possible, you can now dynamically update any of the properties – where the firmware supports it. So you can implement digital zoom by adjusting the region-of-interest (roi) properties on the fly, or update the annotation or change video effects and colour balance, for example.

The timestamps produced are now based on the internal STC of the Raspberry Pi, so the audio video sync is tighter. Although it was never terrible, it’s now more correct and slightly less jittery.

The one major feature I haven’t enabled as yet is stereoscopic handling. Stereoscopic capture requires 2 cameras attached to a Raspberry Pi Compute Module, so at the moment I have no way to test it works.

I’m also working on GStreamer stereoscopic handling in general (which is coming along). I look forward to releasing some of that code soon.


Sahana Nepal Earthquake SitRep 1

As you are probably aware a 7.8 magnitude earthquake has struck Nepal on 25th April causing 2,288 deaths and injuring over 5,500 people [1]. Sahana is already being used in Nepal by both the Nepal Red Cross Society and the National Emergency Operation Center [Read the Rest...]

April 26, 2015

Anti-Systemd People

For the Technical People

This post isn’t really about technology, I’ll cover the technology briefly skip to the next section if you aren’t interested in Linux programming or system administration.

I’ve been using the Systemd init system for a long time, I first tested it in 2010 [1]. I use Systemd on most of my systems that run Debian/Wheezy (which means most of the Linux systems I run which aren’t embedded systems). Currently the only systems where I’m not running Systemd are some systems on which I don’t have console access, while Systemd works reasonably well it wasn’t a standard init system for Debian/Wheezy so I don’t run it everywhere. That said I haven’t had any problems with Systemd in Wheezy, so I might have been too paranoid.

I recently wrote a blog post about systemd, just some basic information on how to use it and why it’s not a big deal [2]. I’ve been playing with Systemd for almost 5 years and using it in production for almost 2 years and it’s performed well. The most serious bug I’ve found in systemd is Bug #774153 which causes a Wheezy->Jessie upgrade to hang until you run “systemctl daemon-reexec” [3].

I know that some people have had problems with systemd, but any piece of significant software will cause problems for some people, there are bugs in all software that is complex enough to be useful. However the fact that it has worked so well for me on so many systems suggests that it’s not going to cause huge problems, it should be covered in the routine testing that is needed for a significant deployment of any new version of a distribution.

I’ve been using Debian for a long time. The transitions from libc4 to libc5 and then libc6 were complex but didn’t break much. The use of devfs in Debian caused some issues and then the removal of devfs caused other issues. The introduction of udev probably caused problems for some people too. Doing major updates to Debian systems isn’t something that is new or which will necessarily cause significant problems, I don’t think that the change to systemd by default compares to changing from a.out binaries to ELF binaries (which required replacing all shared objects and executables).

The Social Issue of the Default Init

Recently the Debian technical committee determined that Systemd was the best choice for the default init system in Debian/Jessie (the next release of Debian which will come out soon). Decisions about which programs should be in the default install are made periodically and it’s usually not a big deal. Even when the choice is between options that directly involve the user (such as the KDE and GNOME desktop environments) it’s not really a big deal because you can just install a non-default option.

One of the strengths of Debian has always been the fact that any Debian Developer (DD) can just add any new package to the archive if they maintain it to a suitable technical standard and if copyright and all other relevant laws are respected. Any DD who doesn’t like any of the current init systems can just package a new one and upload it. Obviously the default option will get more testing, so the non-default options will need more testing by the maintainer. This is particularly difficult for programs that have significant interaction with other parts of the system, I’ve had difficulties with this over the course of 14 years of SE Linux development but I’ve also found that it’s not an impossible problem to solve.

It’s generally accepted that making demands of other people’s volunteer work is a bad thing, which to some extent is a reasonable position. There is a problem when this is taken to extremes, Debian has over 1000 developers who have to work together so sometimes it’s a question of who gets to do the extra work to make the parts of the distribution fit together. The issue of who gets to do the work is often based on what parts are the defaults or most commonly used options. For my work on SE Linux I often have to do a lot of extra work because it’s not part of the default install and I have to make my requests for changes to other packages be as small and simple as possible.

So part of the decision to make Systemd be the default init is essentially a decision to impose slightly more development effort on the people who maintain SysVInit if they are to provide the same level of support – of course given the lack of overall development on SysVInit the level of support provided may decrease. It also means slightly less development effort for the people who maintain Systemd as developers of daemon packages MUST make them work with it. Another part of this issue is the fact that DDs who maintain daemon packages need to maintain init.d scripts (for SysVInit) and systemd scripts, presumably most DDs will have a preference for one init system and do less testing for the other one. Therefore the choice of systemd as the default means that slightly less developer effort will go into init.d scripts. On average this will slightly increase the amount of sysadmin effort that will be required to run systems with SysVInit as the scripts will on average be less well tested. This isn’t going to be a problem in the short term as the current scripts are working reasonably well, but over the course of years bugs may creep in and a proposed solution to this is to have SysVInit scripts generated from systemd config files.

We did have a long debate within Debian about the issue of default init systems and many Debian Developers disagree about this. But there is a big difference between volunteers debating about their work and external people who don’t contribute but believe that they are entitled to tell us what to do. Especially when the non-contributors abuse the people who do the work.

The Crowd Reaction

In a world filled with reasonable people who aren’t assholes there wouldn’t be any more reaction to this than there has been to decisions such as which desktop environment should be the default (which has caused some debate but nothing serious). The issue of which desktop environment (or which version of a desktop environment) to support has a significant affect on users that can’t be avoided, I could understand people being a little upset about that. But the init system isn’t something that most users will notice – apart from the boot time.

For some reason the men in the Linux community who hate women the most seem to have taken a dislike to systemd. I understand that being “conservative” might mean not wanting changes to software as well as not wanting changes to inequality in society but even so this surprised me. My last blog post about systemd has probably set a personal record for the amount of misogynistic and homophobic abuse I received in the comments. More gender and sexuality related abuse than I usually receive when posting about the issues of gender and sexuality in the context of the FOSS community! For the record this doesn’t bother me, when I get such abuse I’m just going to write more about the topic in question.

While the issue of which init system to use by default in Debian was being discussed we had a lot of hostility from unimportant people who for some reason thought that they might get their way by being abusive and threatening people. As expected that didn’t give the result they desired, but it did result in a small trend towards people who are less concerned about the reactions of users taking on development work related to init systems.

The next thing that they did was to announce a “fork” of Debian. Forking software means maintaining a separate version due to a serious disagreement about how it should be maintained. Doing that requires a significant amount of work in compiling all the source code and testing the results. The sensible option would be to just maintain a separate repository of modified packages as has been done many times before. One of the most well known repositories was the Debian Multimedia repository, it was controversial due to flouting legal issues (the developer produced code that was legal where they lived) and due to confusion among users. But it demonstrated that you can make a repository containing many modified packages. In my work on SE Linux I’ve always had a repository of packages containing changes that haven’t been accepted into Debian, which included changes to SysVInit in about 2001.

The latest news on the fork-Debian front seems to be the call for donations [4]. Apparently most of the money that was spent went to accounting fees and buying a laptop for a developer. The amount of money involved is fairly small, Forbes has an article about how awful people can use “controversy” to get crowd-funding windfalls [5].

MikeeUSA is an evil person who hates systemd [6]. This isn’t any sort of evidence that systemd is great (I’m sure that evil people make reasonable choices about software on occasion). But it is a significant factor in support for non-systemd variants of Debian (and other Linux distributions). Decent people don’t want to be associated with people like MikeeUSA, the fact that the anti-systemd people seem happy to associate with him isn’t going to help their cause.


Forking Debian is not the correct technical solution to any problem you might have with a few packages. Filing bug reports and possibly forking those packages in an external repository is the right thing to do.

Sending homophobic and sexist abuse is going to make you as popular as the GamerGate and people. It’s not going to convince anyone to change their mind about technical decisions.

Abusing volunteers who might consider donating some of their time to projects that you like is generally a bad idea. If you abuse them enough you might get them to volunteer less of their time, but the most likely result is that they just don’t volunteer on anything associated with you.

Abusing people who write technical blog posts isn’t going to convince them that they made an error. Abuse is evidence of the absence of technical errors.

April 24, 2015

rkt 0.5.4, featuring repository authentication, port forwarding and more

Since the last rkt release a few weeks ago, development has continued apace, and today we're happy to announce rkt v0.5.4. This release includes a number of new features and improvements across the board, including authentication for image fetching, per-application arguments, running from pod manifests, and port forwarding support – check below the break for more details.

rkt, a container runtime for application containers, is under heavy development but making rapid progress towards a 1.0 release. Earlier this week, VMware announced support for rkt and the emerging App Container (appc) specification. appc is an open specification defining how applications can be run in containers, and rkt is the first implementation of the spec. With increasing industry commitment and involvement in appc, it is quickly fulfilling its promise of becoming a standard of how applications should be deployed in containers.

VMware released a short demo about how its new Project Photon works with rkt via Vagrant and VMware Fusion.

Read on below for more about the latest features in rkt 0.5.4.

Authentication for image fetching

rkt now supports HTTP Basic and OAuth Bearer Token authentication when retrieving remote images from HTTP endpoints and Docker registries. To facilitate this, we've introduced a flexible configuration system, allowing vendors to ship default configurations and then systems administrators to supplement or override configuration locally. Configuration is fully versioned to support forwards and backwards compatibility – check out the rkt documentation for more details.

Here's a simple example of fetching an image from a private Docker registry (note that Docker registries support only Basic authentication):

$ sudo cat /etc/rkt/auth.d/myuser.json 
    "rktKind": "dockerAuth",
    "rktVersion": "v1",
    "registries": [""],
    "credentials": {
        "user": "myuser",
        "password": "sekr3tstuff"
$ sudo /rkt --insecure-skip-verify fetch docker://
rkt: fetching image from docker://
Downloading layer: cf2616975b4a3cba083ca99bc3f0bf25f5f528c3c52be1596b30f60b0b1c37ff
Downloading layer: 6ce2e90b0bc7224de3db1f0d646fe8e2c4dd37f1793928287f6074bc451a57ea

Per-application arguments and image signature verification for local images

The flag parsing in rkt run has been reworked to support per-app flags when running a pod with multiple images. Furthermore, in keeping with our philosophy of "secure by default", rkt will now attempt signature verification even when referencing local image files (during rkt fetch or rkt run commands). In this case, rkt expects to find the signature file alongside the referenced image – for example:

 $ rkt run imgs/pauser.aci
     error opening signature file: open /home/coreos/rkt/imgs/pauser.aci.asc: no such file or directory
 $ gpg2 --armor --detach-sign imgs/pauser.aci
 $ rkt run imgs/pauser.aci
     rkt: signature verified:
       Irma Bot (ACI Signing Key)
     ^]^]^]Container rootfs terminated by signal KILL.

Specific signatures can be provided with the --signature flag, which also applies per-app in the case of multiple references. In this example, we import two local images into the rkt CAS, specifying images signatures for both:

     $ rkt fetch   \
        imgs/pauser.aci --signature ./present.asc  \
        imgs/bash.aci --signature foo.asc
      rkt: signature verified:
        Joe Packager (CoreOS)

Running from pod manifests

In previous versions of rkt, the arguments passed to rkt run (or rkt prepare) would be used to internally generate a Pod Manifest which is executed by later stages of rkt. This release introduces a new flag, --pod-manifest, to both rkt prepare and rkt run, to supply a pre-created pod manifest to rkt.

A pod manifest completely defines the execution environment of the pod to be run, such as volume mounts, port mappings, isolators, etc. This allows users complete control over all of these parameters in a well-defined way, without the need of a complicated rkt command-line invocation. For example, when integrating rkt as a container runtime for a cluster orchestration system like Kubernetes, the system can now programmatically generate a pod manifest instead of feeding a complicated series of arguments to the rkt CLI.

In this first implementation — and following the prescriptions of the upstream appc spec — the pod manifest is treated as the definitive record of the desired execution state: anything specified in the app fields will override what is in the original image, such as exec parameters, volumes mounts, port mappings, etc. This allows the operator to completely control what will be executed by rkt. Since the pod manifest is treated as a complete source of truth — and expected to be generated by orchestration tools with complete knowledge of the execution environment – --pod-manifest is initially considered mutually exclusive with other flags, such as --volumes and --port. See rkt run --help for more details.

Port forwarding

rkt now supports forwarding ports from the host to pods when using private networking.

As a simple example, given an app with the following ports entry in its Image Manifest:

    "name": "http",
    "port": 80,
    "protocol": "tcp"

the following rkt run command can be used to forward traffic from the host's TCP port 8888 to port 80 inside the pod:

rkt run --private-net --port=http:8888 myapp.aci

Whenever possible, it is more convenient to use a SDN solution like flannel to assign routable IPs to rkt pods. However, when such an option is not available, or for "edge" apps that require straddling both SDN and external networks (such as a load balancer), port forwarding can be used to expose select ports to the pod.

Testing, forward-compatibility, and more

There's plenty more under the hood in this release, including an extensive functional test harness, a new database schema migration process, and various internal improvements to the codebase. As we've talked about previously, rkt is a young project and we aren't yet able to guarantee API/ABI stability between releases, but forward-compatibility is a top priority for the forthcoming 0.6 release, and these changes are important steps towards this goal.

For full details of all the changes in this release, check out the release on GitHub.

Get involved!

We're on a journey to create an efficient, secure and composable application container runtime for production environments, and we want you to join us. Take part in the discussion through the rkt-dev mailing list or GitHub issues — and for those eager to get stuck in, contribute directly to the project. Are you doing interesting things with rkt or appc and want to share it with the world? Contact our marketing team at

CAP on a Map project kickoff in the Maldives

A workshop and set of meetings (April 15 & 16, 2015) took place in the capitol city Male in the Maldives. It was an event of the CAP on a Map kickoff in the Maldives. The project aims to improve [Read the Rest...]

April 23, 2015

Verification Challenge 5: Uses of RCU

This is another self-directed verification challenge, this time to validate uses of RCU instead of validating the RCU implementations as in earlier posts. As you can see from Verification Challenge 4, the logic expression corresponding even to the simplest Linux-kernel RCU implementation is quite large, weighing in at tens of thousands of variables and hundreds of thousands of clauses. It is therefore worthwhile to look into the possibility of a trivial model of RCU that could be used for verification.

Because logic expressions do not care about cache locality, memory contention, energy efficiency, CPU hotplug, and a host of other complications that a Linux-kernel implementation must deal with, we can start with extreme simplicity. For example:

 1 static int rcu_read_nesting_global;
 3 static void rcu_read_lock(void)
 4 {
 5   (void)__sync_fetch_and_add(&rcu_read_nesting_global, 2);
 6 }
 8 static void rcu_read_unlock(void)
 9 {
10   (void)__sync_fetch_and_add(&rcu_read_nesting_global, -2);
11 }
13 static inline void assert_no_rcu_read_lock(void)
14 {
15   BUG_ON(rcu_read_nesting_global >= 2);
16 }
18 static void synchronize_rcu(void)
19 {
20   if (__sync_fetch_and_xor(&rcu_read_nesting_global, 1) < 2)
21     return;
23   return;
24 }

The idea is to reject any execution in which synchronize_rcu() does not wait for all readers to be done. As before, SET_ASSERT() sets a variable that suppresses all future assertions.

Please note that this model of RCU has some shortcomings:

  1. There is no diagnosis of rcu_read_lock()/rcu_read_unlock() misnesting. (A later version of the model provides limited diagnosis, but under #ifdef CBMC_PROVE_RCU.)

  2. The heavyweight operations in rcu_read_lock() and rcu_read_unlock() result in artificial ordering constraints. Even in TSO systems such as x86 or s390, a store in a prior RCU read-side critical section might be reordered with loads in later critical sections, but this model will act as if such reordering was prohibited.

  3. Although synchronize_rcu() is permitted to complete once all pre-existing readers are done, in this model it will instead wait until a point in time at which there are absolutely no readers, whether pre-existing or new. Therefore, this model's idea of an RCU grace period is even heavier weight than in real life.

Nevertheless, this approach will allow us to find at least some RCU-usage bugs, and it fits in well with cbmc's default fully-ordered settings. For example, we can use it to verify a variant of the simple litmus test used previously:

 1 int r_x;
 2 int r_y;
 4 int x;
 5 int y;
 7 void *thread_reader(void *arg)
 8 {
 9   rcu_read_lock();
10   r_x = x;
12   rcu_read_unlock();
13   rcu_read_lock();
14 #endif
15   r_y = y;
16   rcu_read_unlock();
17   return NULL;
18 }
20 void *thread_update(void *arg)
21 {
22   x = 1;
24   synchronize_rcu();
25 #endif
26   y = 1;
27   return NULL;
28 }
30 int main(int argc, char *argv[])
31 {
32   pthread_t tr;
34   if (pthread_create(&tr, NULL, thread_reader, NULL))
35     abort();
36   (void)thread_update(NULL);
37   if (pthread_join(tr, NULL))
38     abort();
40   BUG_ON(r_y != 0 && r_x != 1);
41   return 0;
42 }

This model has only 3,032 variables and 8,844 clauses, more than an order of magnitude smaller than for the Tiny RCU verification. Verification takes about half a second, which is almost two orders of magnitude faster than the 30-second verification time for Tiny RCU. In addition, the model successfully flags several injected errors. We have therefore succeeded in producing a simpler and faster model approximating RCU, and that can handle multi-threaded litmus tests.

A natural next step would be to move to litmus tests involving linked lists. Unfortunately, there appear to be problems with cbmc's handling of pointers in multithreaded situations. On the other hand, cbmc's multithreaded support is quite new, so hopefully there will be fixes for these problems in the near future. After fixes appear, I will give the linked-list litmus tests another try.

In the meantime, the full source code for these models may be found here.

Dockerising Puppet

Learn how to use Puppet to manage Docker containers. This post contains complementary technical details to the talk on 23th of April at the Puppet Camp in Sydney.

Manageacloud is a company that specialises in multi-cloud orchestration. Please contact us if you want to know more.



The goal is to manage the configuration of Docker containers using existing puppet modules and Puppet Enterprise. We will use the example of a Wordpress application and two different approaches:

  • Fat containers: treating the container as a virtual machine
  • Microservices: one process per container, as originally recommended by Docker


Docker Workflow



1 - Dockerfile

Dockerfile is the "source code" of the container image:

  • It uses imperative programming, which means we need specify every command, tailored to the target distribution, to achieve the desired state.
  • It is very similar to bash; if you know bash, you know how to use a Dockerfile
  • In large and complex architectures, the goal of the Dockerfile is to hook a configuration management system like puppet to install the required software and configure the container.

For example, this is a Dockerfile that will create a container image with Apache2 installed in Ubuntu:

FROM ubuntu MAINTAINER Ruben Rubio Rey <> RUN apt-get update RUN apt-get install apache2


2 - Container Image

The container image is generated from the Dockerfile using docker build:

docker build -t <image_name> <directory_path_to_Dockerfile>


3 - Registry

An analogy for the Registry is that it works like a git repository. It allows you to push and pull container's images. Container images can have different versions.

The Registry is the central point to distribute Docker containers. It does not matter if you use Kubernetes, CoreOS Fleet, Docker Swarm, Mesos or you are just orchestrating in a Docker host.

For example, if you are the DevOps person within your organization, you may decide that the developers (who are already developing under Linux) will use containers instead of virtual machines for the development environment. The DevOps person should be responsible to creating the Dockerfile, building the container image and pushing it to the registry. All developers within your organization can now pull the latest version of the development environment from the registry and use it.


4 - Development Environment

Docker containers can be used in a development environment. You can make developers more comfortable with the transition to containers by using the controversial "Fat Containers" approach.


5 - Production Environment

You can orchestrate Docker containers in production for two different purposes:

  • Docker Host: Using containers as a way to distribute the configuration. This post focuses on using containers in Docker Hosts.
  • Cluster Management: Mesos, Kubernetes, Docker Swarm and CoreOS Fleet are used to manage containerised applications in clustered environments. This aims to create a layer in the top of the different available virtual machines, allowing you to manage all resources as one unified whole. Those technologies are very likely to evolve significantly over the next 12 months.


Fat Containers vs Microservices

When you are creating containers, there are three different approaches:

  • Microservices: running one single process per container.
  • Fat containers: running many processes and services in a container. In fact, you are treating the container as a virtual machine.

The problem with the microservices approach is that Linux is not really designed for microservices. If you have some processes running in a container, and one of those processes is detached from the parent, it is responsibility of the init process to recycle those resources. If those resources are not recycled, it will become a zombie process.

Some Linux applications are not designed for single process systems either:

  • Many Linux applications are designed to have a crontab daemon to run periodical tasks.
  • Many Linux applications writes vital information directly to the syslog. If the syslog daemon is not running, you might never notice those messages.

In order to use multiple processes in a container, you need to use an init process or similar. There are base images with init processes built in. For example for ubuntu and debian.

What to use ? My advice is to be pragmatic; no one size fits all. Your goal is to solve business problems without creating technical debt. If fat containers better suits your business need, use it. However if microservices fits better, use that instead. Ideally, you should know how to use both, and analyse the case in point to decide what is best for your company. There are no technical reasons to use one over the other.



Managing Docker Containers with Puppet

When we use Puppet (or any other configuration management system) to manage Docker containers, there are two sets of tasks: container creation and container orchestration.


Container Creation

  1. The Dockerfile installs the puppet clients and invokes the puppet master to retrieve the container's configuration
  2. The new image is pushed to the registry


Container Orchestration

  1. Docker's host puppet agent invokes the puppet master to get the configuration
  2. The puppet agent identifies a set of containers. Those containers must be pulled from the Docker registry
  3. The puppet agent pulls, configures and starts the Docker containers in the Docker host


Puppet Master Configuration

For this configuration, we are assuming that Puppet Master is running in a private network, where all the clients are secure. This allows us to use the configuration setting autosign = true in the master's puppet.conf.


Docker Registry

The Docker registry is like a "git repository" for containers. You can push and pull containers. Containers might have a version number. You can use a provider for the Docker registry or you can install one yourself. For this example we will use the module garethr/docker from the PuppetForge to create our docker-registry puppet manifest:

class docker-registry {

    include 'docker'

    docker::run { 'local-registry':

        # Name of the container in Docker Hub

        image => 'registry',

        # We are mapping a port from the Docker host to the container.

        # If you don't do that you cannot have access

        # to the services available in the container

        ports           => ['5000:5000'],

        # We send the configuration parameters that are required to configure a insecure version of a local registry

        env             => ['SETTINGS_FLAVOR=dev', 'STORAGE_PATH=/var/docker-registry/local-registry'],

        # Containers are stateless. If you modify the filesystem

        # you are creating a new container.

        # If we want to push containers, we need a

        # persistent layer somewhere.

        # For this case, in order to have a persistent layer,

        # we are mapping a folder in the host with a folder in the container

        volumes         => ['/var/docker-registry:/var/docker-registry'],



Please note that this installs an insecure Docker registry for testing purposes only.


Fat Containers Approach

For this example, I am using a fat container as I am considering the development environment for the developers within my organization. How fat containers works is very similar to virtual machines, and the learning curve will be close to zero. If the developers are already using Linux, using containers will remove the overhead of the hypervisor and their computer will be faster immediately.

This fat container will contain the following services:

  • Provided by the base image:
    • init
    • syslog
    • crontab
    • ssh
  • Provided by Puppet:
    • mysql
    • apache2 (along with Wordpress codebase)

Dockerfile will create the container Wordpress Fat Container. This is the content:

FROM phusion/baseimage

MAINTAINER Ruben Rubio Rey  ""

# Activate AU mirrors

COPY files/ /etc/apt/sources.list

# Install puppet client using Puppet Enterprise

RUN curl -k | bash

# Configure puppet client (Just removed the last line for the "certname")

COPY files/puppet.conf /etc/puppetlabs/puppet/puppet.conf

# Apply puppet changes. Note certname, we are using "wordpress-image-"

# and three random characters.

#  - "wordpress-image-" allows Puppet Enterprise

# to identify which classes must be applied

#  - The three random characters are used to

# avoid conflict with the node certificates

RUN puppet agent --debug --verbose --no-daemonize --onetime --certname wordpress-image-`date +%s | sha256sum | head -c 3; echo `

# Enable SSH - As this is meant to be a development environment,

# SSH might be useful to the developer

# This is needed for phusion/baseimage only

RUN rm -f /etc/service/sshd/down

# Change root password - even if we use key authentication

# knowing the root's password is useful for developers

RUN echo "root:mypassword" | chpasswd

# We enable the services that puppet is installing

COPY files/init /etc/my_init.d/10_init_services

RUN chmod +x /etc/my_init.d/10_init_services

When we are building the Docker container, it will request the configuration from the Puppet Master using the certname "wordpress-image-XXX" being XXX random characters.

Puppet master returns the following manifest:

class wordpress-all-in-one {

  # Problems using official mysql from Puppet Forge

  # If you try to install mysql using package {"mysql": ensure => installed }

  # it crashes. It tries to do something with the init process

  # and this container does not have a

  # fully featured init process. "mysql-noinit" installs

  # mysql without any init dependency.

  # note that although we cannot use mysql Puppet Forge

  # module to install the software, we can use

  # the types to create database, create user

  # and grant permissions

  include "mysql-noinit"

  # Fix unsatisfied requirements in Wordpress class.

  # hunner/wordpress module assumes that

  # wget is installed in the system. However,

  # containers by default has minimal software

  # installed.

  package {"wget": ensure => latest}

  # hunner/wordpress,

  # removing any task related with

  # the database (it will crash when

  # checking if mysql package is installed)

  class { 'wordpress':

    install_dir => '/var/www/wordpress',

    db_user     => 'wp_user',

    db_password => 'password',

    create_db   => false,

    create_db_user => false


  # Ad-hoc apache configuration

  # installs apache, php and adds the

  # virtual server wordpress.conf

  include "apache-wordpress"


Build the container image:

docker build -t puppet_wordpress_all_in_one /path/to/Dockerfile_folder/

Push the image to the registry

docker tag puppet_wordpress_all_in_one docker push

Orchestrate the container

To orchestrate the fat container in a Docker host:

class container-wordpress-all-in-one {

    class { 'docker':

        extra_parameters=> ['--insecure-registry']


    docker::run { 'wordpress-all-in-one':

        # image is fetched from the Registry

        image => '',

        # The fat container is mapping the port 80 from the docker host to

        # the container's port 80

        ports => ['80:80'],



Microservices Approach

Now we are going to use as much as possible of the existing code using the Microservices Architecture approach. For this approach we will have two containers, a DB container running MySQL and a WEB container running Apache2.


1 - MySQL (DB) Microservice Container

As usual, we use the Dockerfile to build the Docker image.

Dockerfiles are very similar. I will highlight the changes.

# This time we are using the Docker Official image Ubuntu (no init process)

FROM ubuntu

MAINTAINER Ruben Rubio Rey ""

# Activate AU mirrors

COPY files/ /etc/apt/sources.list

# This base image does not have curl installed

RUN apt-get update && apt-get install -y curl

# Install puppet client

RUN curl -k | bash

# Configure puppet client

COPY files/puppet.conf /etc/puppetlabs/puppet/puppet.conf

# Apply puppet changes. We change the certname

# so Puppet Master knows what configuration to retrieve.

RUN puppet agent --debug --verbose --no-daemonize --onetime --certname ms-mysql-image-`date +%s | sha256sum | head -c 3; echo `

# Expose MySQL to Docker network

# We are notifying the Docker network that there is a container

# that has a service and other containers might need it


The class returned by Puppet Master is wordpress-ms-mysql. You will notice that this class is exactly the same as the fat container, but anything that is not related to the database is commented out.

class wordpress-mysql-ms {

    # Install MySQL

    include "mysql-noinit"

    # Unsatisfied requirements in wordpress class

    # package {"wget": ensure => latest}

    # Puppet forge wordpress class, removing mysql

    # class { 'wordpress':

    #   install_dir => '/var/www/wordpress',

    #   db_user => 'wp_user',

    #   db_password => 'password',


    # Apache configuration not needed

    # include "apache-wordpress"


Build the container

docker build -t puppet_ms_mysql .

Push the container to the registry

docker tag puppet_ms_mysql sudo docker push


2 - Apache (WEB) Microservice Container

Once more, we use the Dockerfile to build the image. The file is exactly the same as the MySQL, except for a few lines that are highlighted.

FROM ubuntu

MAINTAINER Ruben Rubio Rey ""

# Activate AU mirrors

COPY files/ /etc/apt/sources.list

# Install CURL

RUN apt-get update && apt-get install -y curl

# Install puppet client

RUN curl -k | bash

# Configure puppet client

COPY files/puppet.conf /etc/puppetlabs/puppet/puppet.conf

# Apply puppet changes

RUN puppet agent --debug --verbose --no-daemonize --onetime --certname ms-apache-image-`date +%s | sha256sum | head -c 3; echo `

# Apply patch to link container.

# We have to tell Wordpress where

# mysql service is running,

# using a system environment variable

# (Explanation in the next section)

# If we are using Puppet for microservices

# we should update the Wordpress module

# to set this environment variable.

# In this case, I am exposing the changes so

# it is easier to see what is changing.

RUN apt-get install patch -y

COPY files/wp-config.patch /var/www/wordpress/wp-config.patch

RUN cd /var/www/wordpress && patch wp-config.php < wp-config.patch

# We configure PHP to read system environment variables

COPY files/90-env.ini /etc/php5/apache2/conf.d/90-env.ini

The class returned by Puppet Master is wordpress-apache-ms. You will notice that it is very similar to wordpress-ms-mysql and to the one used by the fat container wordpress-all-in-one. The difference is that everything related with mysql is commented out and everything related with wordpress and apache is executed.

class wordpress-apache-ms {

    # MySQL won't be installed here

    # include "mysql-noinit"

    # Unsatisfied requirements in wordpress class

    package {"wget": ensure => latest}

    # Puppet forge wordpress class, removing mysql

    class { 'wordpress':

        install_dir => '/var/www/wordpress',

        db_user => 'wp_user',

        db_password => 'password',

        create_db => false,

        create_db_user => false


    # Ad-hoc apache configuration

    include "apache-wordpress"



3 - Orchestrating Web and DB Microservice

The Puppet class that orchestrates both microservies is called container-wordpress-ms:

class container-wordpress-ms {

    # Make sure that Docker is installed

    # and that it can get images from our insecure registry

    class { 'docker':

        extra_parameters=> ['--insecure-registry']


    # Container DB will run MySQL

    docker::run { 'db':

        # The image is taken from the registry

        image => '',

        command => '/usr/sbin/mysqld --bind-address=',

        use_name => true


    # Container WEB will run Apache

    docker::run { 'web':

        # The image is taken from the Registry

        image => '',

        command => '/usr/sbin/apache2ctl -D FOREGROUND',

        # We are mapping a port between the Docker Host and the Apache container.

        ports => ['80:80'],

        # We link WEB container to DB container. This will allow WEB to access to the

        # services exposed under DB container (in this case 3306)

        links => ['db:db'],

        use_name => true,

       # We need DB container up and running before running WEB.

        depends => ['db'],




APPENDIX I: Linking containers

When we are linking containers in the microservices approach we are are performing the following tasks


Starting "db" container:

This will start puppet_ms_mysql, named as db container. Please note that puppet_ms_mysql is exposing the port 3306, which notifies Docker that this container has a service that might be useful for other containers.

docker run --name db -d puppet_ms_mysql /usr/sbin/mysqld --bind-address=


Starting "web" container

Now we want to start the container puppet_ms_apache, named as web .

If we link the containers and execute the command env the folllowing environment variables are created in the web container:

docker run --name web -p 1800:80 --link db:db puppet_ms_apache env PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=8d48e28094e3 DB_PORT=tcp:// DB_PORT_3306_TCP=tcp:// DB_PORT_3306_TCP_ADDR= DB_PORT_3306_TCP_PORT=3306 DB_PORT_3306_TCP_PROTO=tcp DB_NAME=/web/db HOME=/root

These variables point out where the mysql database is. Thus, the application should use the environment variable DB_PORT_3306_TCP_ADDR to connect to the database.

  • DB is the name of the container we are linking to
  • 3306 is the port exposed in the Dockerfile of the db container


APPENDIX II: Docker Compose

When working with microservices, you want to avoid long commands. Docker Compose makes the management of long Docker commands a lot easier. For example, this is how the Microservices approach would look with Docker Compose:

file docker-compose.yml


  image: puppet_ms_apache

  command: /usr/sbin/apache2ctl -D FOREGROUND


   - db:db


   - "80:80"


  image: puppet_ms_mysql

  command: /usr/sbin/mysqld --bind-address=


and you can execute both contianers with the command docker-compose up

April 20, 2015

VMware Ships rkt and Supports App Container Spec

Today VMware shipped rkt, the application container runtime, and made it available to VMware customers in Project Photon. VMware also announced their support of the App Container spec, of which rkt is the first implementation.

“VMware is happy to provide rkt to offer our customers application container choice. rkt is the first implementation of the App Container spec (appc), and we look forward to contributing to the appc community to advance security and portability between platforms.”

— Kit Colbert, vice president and CTO, Cloud-Native Apps, VMware

We are thrilled to welcome VMware into the appc and rkt communities. The appc specific was formed to create an industry standard of how applications should be deployed in containers, with a focus on portability, composability, and security. rkt is a project originated by CoreOS to provide a production-ready Linux implementation of the specification.

VMware's extensive experience with running applications at scale in enterprise environments will be incredibly valuable as we work together with the community towards a 1.0 release of the appc specification and the rkt project.

Join us on our mission to create a secure, composable and standards-based container runtime. We welcome your involvement and contributions to rkt and appc:

April 16, 2015

etcd 2.0 in CoreOS Alpha Image

Today we are pleased to announce that the first CoreOS image to have an etcd v2.0 release is now available in CoreOS alpha channel. etcd v2.0 marks a milestone in the evolution of etcd and includes many new features and improvements over etcd 0.4 including:

  • Reconfiguration protocol improvements: guards against accidental misconfiguration
  • New raft implementation: provides improved cluster stability
  • On-disk safety improvements: utilizes CRC checksums and append-only log behavior

etcd is an open source, distributed, consistent key-value store. It is a core component of CoreOS software that facilitates safe automatic updates, coordinates work scheduled to hosts, and sets up overlay networking for containers. Check out the etcd v2.0 announcement for more details on etcd and the new features.

We’ve been using etcd v2.0 in production behind and for a few months now and it has proven to be stable in these use cases. All existing applications that use the etcd API should work against this new version of etcd. We have tested etcd v2.0 with applications like fleet, locksmith and flannel. The user facing API to etcd should provide the same features it had in the past; if you find issues please report them on GitHub.

Setup Using cloud-init

If you want to dive right in and try out bootstrapping a new cluster, the cloud-init docs have full details on all of the parameters. To support the new features of etcd v2.0, such as multiple listen addresses and proxy modes, a new cloud-init section named etcd2 is used. With a few lines of configuration and a new discovery token, you can take etcd v2.0 for a spin on your cluster.

IANA Ports

With the release of etcd2, we’ve taken the opportunity to begin the transition to our IANA-assigned port numbers: 2379 and 2380. For backward compatibility, etcd2 is configured to listen on both the new and old port numbers (4001 and 7001) by default, but this can always be further restricted as desired.

Migration and Changes

Existing clusters running etcd 0.4 clusters will not automatically migrate to etcd v2.0. As there are semantic changes in how etcd clusters are managed between the two versions, we have decided to include both. There are documented methods to migrate to etcd v2.0 and you may do this at your own pace. We encourage users to use etcd v2.0 for all new clusters to take advantage of the large number of stability and performance improvements over the older series.

In this process, we have had to break backward compatibility in two cases in order to support this change:

  1. Starting fleet.service without explicitly starting etcd.service or etcd2.service will no longer work. If you are using fleet and need a local etcd endpoint, you will need to also start etcd.service or etcd2.service.

  2. Starting flannel.service without explicitly starting etcd.service or etcd2.service will no longer work. If you are using flannel and need a local etcd endpoint, you will need to also start etcd.service or etcd2.service.

We have discouraged the use of this implicit dependency via our documentation but you can check if you will be affected. Make sure that etcd.service or etcd2.service are enabled or started in your cloud-config.

Looking Forward

As we look forward to etcd v2.1.0 and beyond, there are a number of exciting things shaping up inside of etcd. In the near future new features such as the authorization and authentication API will make it safer to operate multiple applications on a single cluster. The team has also been operating both on-going test environments that introduce regular partitions and crashes and making practical benchmarks available. In the last few days there has also been an active discussion on how to evolve the etcd APIs to better support the applications using etcd for coordination and scheduling today.

We welcome your involvement in the development of etcd - via the etcd-dev discussion mailing list, GitHub issues, or contributing directly to the project.

April 14, 2015

CoreOS on ARM64

This is a guest post from CoreOS contributor, Geoff Levand, Linux Architect, Huawei America Software Lab. He has started work on an ARM64 port of CoreOS. Here is the current state of the project, followed by how you can help.

Recent patches that I've contributed to CoreOS have added basic support for a new target board named arm64-usr. There is currently a single generic ARM64 little endian Linux profile. This profile should work with any ARM64 platform currently supported by the mainline Linux kernel, so the ARM V8 Foundation Model, the ARM FVP_VE Fast Model, the ARM FVP_BASE Fast Model, and recent qemu-system-aarch64. I hope to add other profiles to support an ARM64 big endian build, and also to get the eight-core HiSilicon 6220 based HiKey developer board supported.

ARM64 porting work is still in progress, so please consider what is done so far as experimental. Some initial work I did along with Michael Marineau of CoreOS was to clean up parts of the CoreOS build system to simplify the way architectures are defined, and also to make the generic build infrastructure completely architecture agnostic. The resulting system should make it quite straight forward to add additional architecture support to CoreOS.

The ARM64 architecture is a relatively new one, so many upstream software packages have either only recently been updated to support ARM64, or have not yet been. Much of my CoreOS porting work so far has been going through the packages which don't build and figuring out how to get them to build. Sometimes a package can be updated to the latest upstream, sometimes a package keyword can be set, sometimes a modification to the ebuild in coreos-overlay will work, and other times a combination of these are needed. This process is still ongoing, and some difficult packages still lay ahead. The resulting arm64-usr build is experimental and all the work to bring it up will need testing and review in the future.

There is still a lot of work to be done. Many more packages need to be brought up, and as I mentioned, this involves working at a low level with the package ebuild files and the CoreOS build system. At another level, all the CoreOS features will need to be exercised and verified as needed to bring up the stability and confidence of the port. There are going to be multi-arch clusters, so ARM64 and x86_64 nodes are going to need to work together -- it sounds pretty cool. Someone will need to get in there and make that happen. If you have any interest in the ARM64 port I encourage you get involve and help out.

For general info about the port you can look at my Github site. For those who would like to investigate more, or even help with the effort, see my CoreOS ARM64 HOWTO document.

Continue the discussion with Geoff at CoreOS Fest and on freenode in #coreos as geoff-

April 13, 2015

Counting Down to CoreOS Fest on May 4 and 5

As we count down to the inaugural CoreOS Fest in just three weeks, we are thrilled to announce additional speakers and the agenda! CoreOS Fest will be May 4-5 at The Village at 969 Market Street in San Francisco and we hope you will join us.

CoreOS Fest is a two-day event about the tools and best practices used to build modern infrastructure stacks. CoreOS Fest connects people from all levels of the community with future-thinking industry veterans to learn how to build distributed systems that support application containers. This May’s festival is brought to you by our premier sponsor Intel, and additional sponsors Sysdig, Chef, Mesosphere, Metaswitch Networks and Giant Swarm.

CoreOS Fest will include speakers from Google, Intel, Salesforce, HP, and more, including:

  • Brendan Burns, software engineer at Google and founder of Kubernetes, will provide a technical overview of Kubernetes

  • Diego Ongaro, creator of Raft, will discuss the Raft Consensus Algorithm

  • Lennart Poettering, creator of systemd, will talk about systemd at the Core of the OS

  • Nicholas Weaver, director of SDI-X at Intel, will demonstrate how we can optimize container architectures for the next level of scale

  • Prakash Rudraraju, manager of technical operations at Salesforce, will join Brian Harrington, principal architect at CoreOS, for a fireside chat on how Salesforce is thinking about distributed systems and application containers

  • Yazz Atlas, HPCS principle engineer with Hewlett-Packard Advanced Technology Group, will give a presentation on automated MySQL Cluster Failover using Galera Cluster on CoreOS Linux

  • Loris Degioanni, CEO and founder of Sysdig and co-creator of Wireshark, will present the dark art of container monitoring

  • Gabriel Monroy, CTO at OpDemand/Deis, will discuss lessons learned from building platforms on top of CoreOS

  • Spencer Kimball, founder of Cockroach Labs, will talk about CockroachDB

  • Chris Winslett, product manager at, will present etcd based Postgres SQL HA Cluster

  • Timo Derstappen, co-founder of Giant Swarm, will present Containers on the Autobahn

More speakers will be added at

As a part of today's schedule announcement, we are offering 10 percent off the regular ticket price until tomorrow, April 14, at 10 a.m. PT. Use this link to reserve your 10 percent off ticket. Tickets are selling fast so get them before we sell out!

Once again, CoreOS Fest thanks its top level sponsor Intel and additional sponsors, including Sysdig, Chef, Mesosphere, Metaswitch Networks and Giant Swarm. If you’re interested in participating at CoreOS Fest as a sponsor, contact

For more CoreOS Fest news, follow along @coreoslinux or #CoreOSFest

April 08, 2015

Upcoming CoreOS Events in April

Supplied with fresh CoreOS t-shirts and half our weight in airport Cinnabons, we’ve made sure that you’ll be seeing a lot of us this April.

Wednesday, April 8, 2015 at 10:15 a.m. EDT - Philadelphia, PA

Don’t miss Kelsey Hightower (@kelseyhightower), developer advocate and toolsmith at CoreOS, kick off our April events by speaking at ETE Conference. He’ll be discussing managing containers at scale with CoreOS and Kubernetes.

Thursday, April 16, 2015 at 7:00 p.m. CET - Amsterdam, Netherlands

Kelsey Hightower will be giving an introduction to fleet, CoreOS and building large reliable systems at the Docker Randstad Meetup.

Thursday, April 16, 2015 at 6:00 p.m. PDT - San Francisco, CA

Brian Harrington will be giving an overview of CoreOS at CloudCamp. This is an unconference dedicated to all things containers.

Friday, April 17, 2015 - San Francisco, CA

Joined by a few of our very own, CoreOS CTO Brandon Philips (@BrandonPhilips) will be speaking at Container Camp. This event focuses on the latest developments in software virtualization. Get your tickets here.

Tuesday April 21 - Saturday, April 25, 2015 - Berlin, Germany

This year we’ll be attending Open Source Data Center Conference (OSDC) where Kelsey Hightower will be talking on building distributed systems with CoreOS.

Wednesday, April 22 at 6:30p.m. CET - Berlin, Germany

If you’re in Berlin, be sure to check out Kelsey Hightower talk about managing containers at scale with CoreOS and Kubernetes.

In case you missed it

In case you missed it, check out Chris Winslett from talk about an etcd-based PostgreSQL HA Cluster:

CoreOS Fest

Don’t forget that CoreOS Fest is happening the following month on May 4 and 5! We’ve released a tentative schedule and our first round of speakers. Keep checking back for more updates as the event gets closer.

April 07, 2015

Sahana Participates for GCI 2014

The Sahana Software Foundation has actively taken part in the Google Code-In programme since its inception in 2010 and 2014′s programme was no exception as Sahana was once again among the 12 open source organizations selected to mentor students for Code-In. [Read the Rest...]

April 06, 2015

Announcing Tectonic: The Commercial Kubernetes Platform

CoreOS Tech Stack + Kubernetes

Our technology is often characterized as “Google’s infrastructure for everyone else.” Today we are excited to make this idea a reality by announcing Tectonic, a commercial Kubernetes platform. Tectonic provides the combined power of the CoreOS portfolio and the Kubernetes project to any cloud or on-premise environment.

Why we are building Tectonic

Our users want to securely run containers at scale in a distributed environment. We help companies do this by building open source tools which allow teams to create this type of infrastructure. With Tectonic, we now have an option for companies that want a preassembled and enterprise-ready distribution of these tools, allowing them to quickly see the benefits of modern container infrastructure.

What is Tectonic?

Tectonic is a platform combining Kubernetes and the CoreOS stack. Tectonic pre-packages all of the components required to build Google-style infrastructure and adds additional commercial features, such as a management console for workflows and dashboards, an integrated registry to build and share Linux containers, and additional tools to automate deployment and customize rolling updates.

Tectonic is available today to a select number of early customers. Head over to to sign up for the waitlist if your company is interested in participating.

What is Kubernetes?

Kubernetes is an open source project introduced by Google to help organizations run their infrastructure in a similar manner to the internal infrastructure that runs Google Search, Gmail, and other Google services. The concepts and workflows in Kubernetes are designed to help engineers focus on their application instead of infrastructure and build for high availability of services. With the Kubernetes APIs, users can manage application infrastructure - such as load balancing, service discovery, and rollout of new versions - in a way that is consistent and fault-tolerant.

Tectonic and CoreOS

Tectonic is a commercial product, and with this release, we have decided to launch our commercial products under a new brand, separate from the CoreOS name. We want our open source components - like etcd, rkt, flannel, and CoreOS Linux - to always be freely available for everyone under their respective open source licenses. We think open source development works best when it is community-supported infrastructure that we all share and build with few direct commercial motives. To that end, we want to keep CoreOS focused on building completely open source components.

To get access to an early release of Tectonic or to learn more, visit To contribute and learn more about our open source projects visit

Google Ventures Funding

In addition to introducing Tectonic, today we are announcing an investment in CoreOS, Inc. led by Google Ventures. It is great to have the support and backing of Google Ventures as we bring the Kubernetes platform to market. The investment will help us accelerate our efforts to secure the backend of the Internet and deliver Google-like infrastructure to everyone else.


Q: What does this change about CoreOS Linux and other open source projects like rkt, etcd, fleet, flannel, etc?

A: Nothing: development will continue, and we want to see all of the open source projects continue to thrive as independent components. CoreOS Linux will remain the same carefully maintained, open source, and container-focused OS it has always been. Tectonic uses many of these projects internally - including rkt, etcd, flannel, and fleet - and runs on top of the same CoreOS Linux operating system as any other application would.

Q: I am using Apache Mesos, Deis, or another application on top of CoreOS Linux: does anything change for me?

A: No, this announcement doesn't change anything about the CoreOS Linux project or software. Tectonic is simply another container-delivered application that runs on top of CoreOS Linux.

Q: What does this change for existing Enterprise Registry, Managed Linux, or customers?

A: Everything will remain the same for existing customers. All of these components are utilized in the Tectonic stack and we continue to offer support, fix bugs and add features to these products.

Follow @TectonicStack on Twitter

Go to to join an early release or to stay up to date on Tectonic news

Visit us in person at CoreOS Fest in San Francisco May 4-5, to learn more about CoreOS, Tectonic and all things distributed systems

April 01, 2015

Announcing rkt v0.5, featuring pods, overlayfs, and more

rkt is a new container runtime for applications, intended to meet the most demanding production requirements of security, efficiency and composability. rkt is also an implementation of the emerging Application Container (appc) specification, an open specification defining how applications can be run in containers. Today we are announcing the next major release of rkt, v0.5, with a number of new features that bring us closer to these goals, and want to give an update on the upcoming roadmap for the rkt project.

appc v0.5 - introducing pods

This release of rkt updates to the latest version of the appc spec, which introduces pods. Pods encapsulate a group of Application Container Images and describe their runtime environment, serving as a first-class unit for application container execution.

Pods are a concept recently popularised by Google's Kubernetes project. The idea emerged from the recognition of a powerful, pervasive pattern in deploying applications in containers, particularly at scale. The key insight is that, while one of the main value propositions of containers is for applications to run in isolated and self-contained environments, it is often useful to co-locate certain "helper" applications within a container. These applications have an intimate knowledge of each other - they are designed and developed to work co-operatively - and hence can share the container environment without conflict, yet still be isolated from interfering with other application containers on the same system.

A classic example of a pod is service discovery using the sidekick model, wherein the main application process serves traffic, and the sidekick process uses its knowledge of the pod environment to register the application in the discovery service. The pod links together the lifecycle of the two processes and ensures they can be jointly deployed and constrained in the cluster.

Another simple example is a database co-located with a backup worker. In this case, the backup worker could be isolated from interfering with the database's work - through memory, I/O and CPU limits applied to the process - but when the database process is shut down the backup process will terminate too. By making the backup worker an independent application container, and making pods the unit of deployment, we can reuse the worker for backing up data from a variety of applications: SQL databases, file stores or simple log files.

This is the power that pods provide: they encapsulate a self-contained, deployable unit that still provides granularity (for example, per-process isolators) and facilitates advanced use cases. Bringing pods to rkt enables it to natively model a huge variety of application use cases, and integrate tightly with cluster-level orchestration systems like Kubernetes.

For more information on pods, including the technical definition, check out the appc spec or the Kubernetes documentation.

overlayfs support

On modern Linux systems, rkt now uses overlayfs by default when running application containers. This provides immense benefits to performance and efficiency: start times for large containers will be much faster, and multiple pods using the same images will consume less disk space and can share page cache entries.

If overlayfs is not supported on the host operating system, rkt gracefully degrades back to the previous behaviour of extracting each image at runtime - this behaviour can also be triggered with the new --no-overlay flag to rkt run.

Another improvement behind the scenes is the introduction of a tree cache for rkt's local image storage. When storing ACIs in its local database (for example, after pulling them from a remote repository using rkt fetch), rkt will now store the expanded root filesystem of the image on disk. This means that when pods that reference this image are subsequently started (via rkt run), the pod filesystem can be created almost instantaneously in the case of overlayfs - or, without overlayfs, by using a simple copy instead of needing to expand the image again from its compressed format.

To facilitate simultaneous use of the tree store by multiple rkt invocations, file-based locking has been added to ensure images that are in use cannot be removed. Future versions of rkt will expose more advanced capabilities to manage images in the store.

stage1 from source

When executing application containers, rkt uses a modular approach (described in the architecture documentation) to support swappable, alternative execution environments. The default stage1 that we develop with rkt itself is based on systemd, but alternative implementations can leverage different technologies like KVM-based virtual machines to execute applications.

In earlier versions of rkt, the pre-bundled stage1 was assembled from a copy of the CoreOS Linux distribution image. We have been working hard to decouple this process to make it easier to package rkt for different operating systems and in different build environments. In rkt 0.5, the default stage1 is now constructed from source code, and over the next few releases we will make it easier to build alternative stage1 images by documenting and stabilizing the ABI.

"Rocket", "rocket", "rkt"?

This release also sees us standardizing on a single name for all areas of the project - the command-line tool, filesystem names and Unix groups, and the title of the project itself. Instead of "rocket", "Rocket", or "rock't", we now simply use "rkt".

rkt logo

Looking forward

rkt is a young project and the last few months have seen rapid changes to the codebase. As we look towards rkt 0.6 and beyond, we will be focusing on making it possible to depend on rkt to roll-forward from version to version without breaking working setups. There are several areas that are needed to make this happen, including reaching the initial stable version (1.0) of the appc spec, implementing functional testing, stabilizing the on-disk formats, and implementing schema upgrades for the store. We realize that stability is vital for people considering using rkt in production environments, and this will be a priority in the next few releases. The goal is to make it possible for a user that was happily using rkt 0.6 to upgrade to rkt 0.7 without having to remove their downloaded ACIs or configuration files.

We welcome your involvement in the development of rkt - via the rkt-dev discussion mailing list, GitHub issues, or contributing directly to the project.

March 27, 2015

CoreOS Fest 2015 First Round of Speakers Announced

As you might already know, we’re launching our first ever CoreOS Fest this May 4th and 5th in San Francisco! We’ve been hard at work making sure that this event is two days filled with all things distributed, and all things awesome.

In addition to many CoreOS project leads taking the stage, we are excited to announce a sneak peek at some of our community speakers. Join us at CoreOS Fest and you’ll hear from some of the most influential people in distributed systems today: Brendan Burns, one of the founders of Kubernetes; Diego Ongaro, the creator of Raft; Gabriel Monroy, the creator of Deis; Spencer Kimball, CEO of Cockroach Labs; Loris Degioanni, CEO of Sysdig; and many more!

We are still accepting submissions for speakers through March 31st, so we encourage you to submit your talk in our Call for Papers portal.

While the schedule will be live in the coming weeks, here's a high level overview:

We’ll kick off day one at 9 AM PDT (with registration and breakfast beforehand) with a single track of speakers, followed by lunch, then afternoon panels and breakouts. You’ll have lots of opportunities to connect and talk with fellow attendees, especially at an evening reception on the first day. Day two will include breakfast, single-track talks, lunch, panels and more.

Confirmed Speakers

See more about our first round of speakers:

Brendan Burns
Brendan Burns
Software Engineer at Google and a founder of the Kubernetes project

Brendan works in the Google Cloud Platform, leading engineering efforts to make the Google Cloud Platform the best place to run containers. He also has managed several other cloud teams including the Managed VMs team, and Cloud DNS. Prior to Cloud, he was a lead engineer in Google’s web search infrastructure, building backends that powered social and personal search. Prior to working at Google, he was a professor at Union College in Schenectady, NY. He received a PhD in Computer Science from the University of Massachusetts Amherst, and a BA in Computer Science and Studio Art from Williams College.

Diego Ongaro
Diego Ongaro
Creator of Raft

Diego recently completed his doctorate with John Ousterhout at Stanford. During his doctorate, he worked on RAMCloud (a 5-10 microsecond RTT key-value store), Raft, and LogCabin (a coordination service built with Raft). He’s lately been continuing development on LogCabin as an independent contractor.

Gabriel Monroy
Gabriel Monroy
CTO of OpDemand and creator of Deis

Gabriel Monroy is CTO at OpDemand and the creator of Deis, the leading CoreOS-based PaaS. As an early contributor to Docker and CoreOS, Gabriel has deep experience putting containers into production and frequently advises organizations on PaaS, container automation and distributed systems. Gabriel spoke recently at QConSF on cluster scheduling and deploying containers at scale.

Spencer Kimball
Spencer Kimball

Spencer is CEO of Cockroach Labs. After helping to re-architect and re-implement Square's items catalog service, Spencer was convinced that the industry needed a more capable database software. He began work on the design and implementation of Cockroach as an open source project and moved to work on it full time at Square mid-2014. Spencer managed the acquisition of Viewfinder by Square as CEO and before that, shared the roles of co-CTO and co-founder. Previously, he worked at Google on systems and web application infrastructure, most recently helping to build Colossus, Google’s exascale distributed file system, and on Java infrastructure, including the open-sourced Google Servlet Engine.

Loris Degioanni
Loris Degioanni
CEO of Sysdig

Loris is the creator and CEO of Sysdig, a popular open source troubleshooting tool for Linux environments. He is a pioneer in the field of network analysis through his work on WinPcap and Wireshark: open source tools with millions of users worldwide. Loris was previously a senior director of technology at Riverbed, and co-founder/CTO at CACE Technologies, the company behind Wireshark. Loris holds a PhD in computer engineering from Politecnico di Torino, Italy.

Excited? Stay tuned for more announcements and join us at CoreOS Fest 2015.

Buy your early bird ticket by March 31st:

Submit a speaking abstract by March 31st: CFP Portal

Become a sponsor, email us for more details.

March 20, 2015

What makes a cluster a cluster?

“What makes a cluster a cluster?” - Ask that question of 10 different engineers and you’ll get 10 different answers. Some look at it from a hardware perspective, some see it as a particular set of cloud technologies, and some say it’s the protocols exchanging information on the network.

With this ever-growing field of distributed systems technologies, it is helpful to compare the goals, roles and differences of some of these new projects based on their functionality. In this post we propose a conceptual description of the cluster at large, while showing some examples of emerging distributed systems technologies.

Layers of abstraction

The tech community has long agreed on what a network looks like. We’ve largely come to agree, in principle, on the OSI (Open Systems Interconnection) model (and in practice, on its close cousin, the TCP/IP model).

A key aspect of this model is the separation of concerns, with well-defined responsibilities and dependence between components: every layer depends on the layer below it and provides useful network functionality (connection, retry, packetization) to the layer above it. At the top, finally, are web sessions and applications of all sorts running and abstracting communication.

So, as an exercise to try to answer “What makes a cluster a cluster?” let’s apply the same sort of thinking to layers of abstraction in terms of execution of code on a group of machines, instead of communication between these machines.

Here’s a snapshot of the OSI model, applied to containers and clustering:

OSI Applied to Clustering

Let’s take a look from the bottom up.

Level 1, Hardware

The hardware layer is where it all begins. In a modern environment, this may mean physical (bare metal) or virtualized hardware – abstraction knows no bounds – but for our purposes, we define hardware as the CPU, RAM, disk and network equipment that is rented or bought in discrete units.

Examples: bare metal, virtual machines, cloud

Level 2, OS/Machine ABI

The OS layer is where we define how software executes on the hardware: the OS gives us the Application Binary Interface (ABI) by which we agree on a common language that our userland applications speak to the OS (system calls, device drivers, and so on). We also set up a network stack so that these machines can communicate amongst each other. This layer therefore provides our lowest level complete execution environment for applications.

Now, traditionally, we stop here, and run our final application on top of this as a third pseudo-layer of the OS and various user-space packages. We provision individual machines with slightly different software stacks (a database server, an app server) and there’s our server rack.

Over the lifetime of servers and software, however, the permutations and histories of individual machine configurations start to become unwieldy. As an industry, we are learning that managing this complexity becomes costly or infeasible over time, even at moderate scale (e.g. 3+ machines).

This is often where people start to talk about containers, as containers treat the entire OS userland as one hermetic application package that can be managed as an independent unit. Because of this abstraction, we can conceptually shift containers up the stack, as long as they’re above layer 2. We’ll revisit containers in layer 6.

Examples: kernel + {systemd, cgroups/namespaces, jails, zones}

Level 3, Cluster Consensus

To begin to mitigate the complexity of managing individual servers, we need to start thinking about machines in some greater, collective sense: this is our first notion of a cluster. We want to write software that scales across these individual servers and shares work effortlessly.

However, as we add more servers to the picture, we now introduce many more points of failure: networks partition, machines crash and disks fail. How can we build systems in the face of greater uncertainty? What we’d like is some way of creating a uniform set of data and data primitives, as needed by distributed systems. Much like in multiprocessor programming, we need the equivalent of locks, message passing, shared memory and atomicity across this group of machines.

This is an interesting and vibrant field of algorithmic research: a first stop for the curious reader should be the works of Leslie Lamport, particularly his earlier writing on ordering and reliability of distributed systems. His later work describes Paxos, the preeminent consensus protocol; the other major protocol, as provided by many projects in this category, is Raft.

Why is this called consensus? The machines need to ‘agree’ on the same history and order of events in order to make the guarantees we’d like for the primitives described. Locks cannot be taken twice, for example, even if some subset of messages disappears or arrives out of order, or member machines crash for unknown reasons.

These algorithms build data structures to form a coherent, consistent, and fault-tolerant whole.

Examples: etcd, ZooKeeper, consul

Level 4, Cluster Resources

With this perspective of a unified cluster, we can now talk about cluster resources. Having abstracted the primitives of individual machines, we use this higher level view to create and interact with the complete set of resources that we have at our disposal. Thus we can consider in aggregate the CPUs, RAM, disk and networking as available to any process in the cluster, as provided by the physical layers underneath.

Viewing the cluster as one large machine, all devices (CPU, RAM, disk, networking) become abstract. This is a benefit already being used by containers. Containers depend on these things being abstracted on their behalf; for example, network bridges. This is so they can use these abstractions at a level higher in the stack while running on any of the underlying hardware.

In some sense, this layer is the equivalent of the hardware layer of the now-primordial notion of the cluster. It may not be as celebrated as the layers above it, but this layer is where some important innovation takes place. Showing a cool auto-scaling webapp demo is nice, but requires things like carving up cluster IP space or where a block device is attached to a host.

Examples: flannel, remote block storage, weave

Level 5, Cluster Orchestration and Scheduling

Cluster orchestration, then, starts to look a lot like an OS kernel atop these cluster-level resources and the tools given by consistency – symmetry with the layers below again. It’s the purview of the orchestration platform to divide and share cluster resources, schedule applications to run, manage permissions, set up interfaces into and out of the cluster, and at the end of the day, find an ABI-compatible environment for the userland. With increased scale comes new challenges: from finding the right machines to providing the best experience to users of the cluster.

Any software that will run on the cluster must ultimately execute on a physical CPU on a particular server. How the application code gets there and what abstractions it sees is controlled by the orchestration layer. This is similar to how WiFi simulates a copper wire to existing network stacks, with a controllable abstraction through access points, signal strength, meshes, encryption and more.

Examples: fleet, Mesos, Kubernetes

Level 6, Containers

This brings us back to containers, which, as described earlier, the entire userland is bundled together and treated as a single application unit.

If you’ve followed the whole stack up to this point, you’ll see why containers sit at level 6, instead of at level 2 or 3. It’s because the layers of abstraction below this point all depend on each other to build up to the point where a single-serving userland can safely abstract whether it’s running as one process on a local machine or as something scheduled on the cluster as a whole.

Containers are actually simple that way; they depend on everything else to provide the appropriate execution environment. They carry userland data and expect specific OS details to be presented to them.

Examples: Rocket, Docker, systemd-nspawn

Level 7, Application

Containers are currently getting a lot of attention in the industry because they can separate the OS and software dependencies from the hardware. By abstracting these details, we can create consistent execution environments across a fleet of machines and let the traditional POSIX userland continue to work, fairly seamlessly, no matter where you take it. If the intention is to share the containers, then choice is important, as is agreeing upon a sharable standard. Containers are exciting; it starts us down the road of a lot of open source work in the realm of true distributed systems, backwards-compatible with the code we already write – our Application.

Closing Thoughts

For any of the layers of the cluster, there are (and will continue to be) multiple implementations. Some will combine layers, some will break them into sub-pieces – but this was true of networking in the past as well (do you remember IPX? Or AppleTalk?).

As we continue to work deeply on the internals of every layer, we also sometimes want to take a step back to look at the overall picture and consider the greater audience of people who are interested and starting to work on clusters of their own. We want to introduce this concept as a guideline, with a symmetric way of thinking about a cluster and its components. We’d love your thoughts on what defines a cluster as more than a mass of hardware.

March 13, 2015

Announcing rkt and App Container 0.4.1

Today we are announcing rkt v0.4.1. rkt is a new app container runtime and implementation of the App Container (appc) spec. This milestone release includes new features like private networking, an enhanced container lifecycle, and unprivileged image fetching, all of which get us closer to our goals of a production-ready container runtime that is composable, secure, and fast.

Private Networking

This release includes our first iteration of the rkt networking subsystem. As an example, let's run etcd in a private network:

# Run an etcd container in a private network
$ rkt run --private-net

By using the --private-net flag, the etcd container will run with its own network stack decoupled from the host. This includes a private lo loopback device and an eth0 device with an IP in the address range. By default, rkt creates a veth pair, with one end becoming eth0 in the container and the other placed on the host. rkt will also set up an IP masquerade rule (NAT) to allow the container to speak to the outside world.

This can be demonstrated by being able to reach etcd on its version endpoint from the host:

$ curl

The networking configuration in rkt is designed to be highly pluggable to facilitate a variety of networking topologies and infrastructures. In this release, we have included plugins for veth, bridge, and macvlan, and more are under active development. See the rkt network docs for details.

If you are interested in building new network plugins, please take a look at the current specification and get involved by reaching out on GitHub or the mailing list. We would also like to extend a thank you to everyone who has spent time giving valuable feedback on the spec so far.

Unprivileged Fetches

It is good practice to download files over the Internet only as unprivileged users. With this release of rkt, it is possible to set up a rkt Unix group, and give users in that group the ability to download and verify container images. For example, let's give the core user permission to use rkt to retrieve images and verify their signature:

$ sudo groupadd rkt
$ sudo usermod -a -G rkt core
$ sudo rkt install
$ rkt fetch
rkt: searching for app image
rkt: fetching image from
Downloading ACI: [==========                                   ] 897 KB/3.76 MB
Downloading signature from
rkt: signature verified:                                       ] 0 B/819 B
  CoreOS ACI Builder <>

The new rkt install subcommand is a simple helper to quickly set up all of the rkt directory permissions. These steps could easily be scripted outside of rkt for a more complex setup or a custom group name; for example, distributions that package rkt in their native formats would configure directory permissions at the time the package is installed.

Note that the image we’ve fetched will still need to be run with sudo, as Linux doesn't yet make it possible to do many of the operations necessary to start a container without root privileges. But at this stage, you can trust that the image comes from an author you have already trusted via rkt trust.

Other Features

rkt prepare is a new command that can be used to set up a container without immediately running it. This gives users the ability to allocate a container ID and do filesystem setup before launching any processes. In this way, a container can be prepared ahead of time, so that when rkt run-prepared is subsequently invoked, the process startup happens immediately with few additional steps. Being able to pre-allocate a unique container ID also facilitates better integration with higher-level orchestration systems.

rkt run can now append additional command line flags and environment variables for all apps, as well as optionally have containers inherit the environment from the parent process. For full details see the command line documentation.

The image store now uses a ql database to track metadata about images in the store. This is used to keep track of URLs, labels, and other metadata of images stored inside rkt's local store. Note that if you are upgrading from a previous rkt release on a system, you may need to remove /var/lib/rkt. We understand people are already beginning to rely on rkt and over the next few releases will focus heavily on introducing stable APIs. But until we are closer to a 1.0 release, expect that there will be more regular changes.

For more details about this 0.4.1 release and pre-compiled standalone rkt Linux binaries see the release page.

Updates to App Container spec

Finally, this change updates rkt to the latest version of the appc spec, v0.4.1. Recent changes to the spec include reworked isolators, new OS-specific requirements, and greater explicitness around image signing and encryption. You can refer to a list of some major changes and additions here.

Join us on the mission to create a secure, composable and standards based container runtime, and get involved in hacking on rkt or App Container here:


Help Wanted, Mailing list

App Container:

Help Wanted, Mailing list

March 12, 2015

rkt Now Available in CoreOS Alpha Channel

Our CoreOS Alpha channel is designed to strike a balance between offering early access to new versions of software and serving as the release candidate for the Beta and Stable channels. Due to its release-candidate nature, we must be conservative in upgrading critical system components (e.g. systemd and etcd), but in order to get new technologies (like fleet and flannel) into the hands of users for testing we must occasionally include pre-production versions of these components in Alpha.

Today, we are adding rkt, a container runtime built on top of the App Container spec, to make it easier for users to try it and give us feedback.

rkt will join systemd-nspawn and Docker as container runtimes that are available to CoreOS users. Keep in mind that rkt is still pre-1.0 and that you should not rely on flags or the data in /var/lib/rkt to work between versions. Specifically, next week v0.4.1 will land in Alpha which is incompatible with images and containers created by previous versions of rkt. Besides the addition of /usr/bin/rkt to the image, nothing major has changed and no additional daemons will run by default.

Release Cadence

We have adopted a regular weekly schedule for Alpha releases, rolling out a new version every Thursday. Every other week we release a Beta, taking the best of the previous two Alpha versions and promoting it bit-for-bit. Similarly, once every four weeks we promote the best of the previous two Beta releases to Stable.

Give it a spin

If you want to spin up a CoreOS Alpha machine and get started, check out the documentation for v0.3.2. We look forward to having you involved in rkt development via the rkt-dev discussion mailing list, GitHub issues, or contributing directly to the project. We have made great progress so far, but there is still much to build!

Confessions of a Recovering Proprietary Programmer, Part XV

So the Linux kernel now has a Documentation/CodeOfConflict file. As one of the people who provided an Acked-by for this file, I thought I should set down what went through my mind while reading it. Taking it one piece at a time:

The Linux kernel development effort is a very personal process compared to “traditional” ways of developing software. Your code and ideas behind it will be carefully reviewed, often resulting in critique and criticism. The review will almost always require improvements to the code before it can be included in the kernel. Know that this happens because everyone involved wants to see the best possible solution for the overall success of Linux. This development process has been proven to create the most robust operating system kernel ever, and we do not want to do anything to cause the quality of submission and eventual result to ever decrease.

In a perfect world, this would go without saying, give or take the “most robust” chest-beating. But I am probably not the only person to have noticed that the world is not always perfect. Sadly, it is probably necessary to remind some people that “job one” for the Linux kernel community is the health and well-being of the Linux kernel itself, and not their own pet project, whatever that might be.

On the other hand, I was also heartened by what does not appear in the above paragraph. There is no assertion that the Linux kernel community's processes are perfect, which is all to the good, because delusions of perfection all too often prevent progress in mature projects. In fact, in this imperfect world, there is nothing so good that it cannot be made better. On the other hand, there also is nothing so bad that it cannot be made worse, so random wholesale changes should be tested somewhere before being applied globally to a project as important as the Linux kernel. I was therefore quite happy to read the last part of this paragraph: “we do not want to do anything to cause the quality of submission and eventual result to ever decrease.”

If however, anyone feels personally abused, threatened, or otherwise uncomfortable due to this process, that is not acceptable.

That sentence is of course critically important, but must be interpreted carefully. For example, it is all too possible that someone might feel abused, threatened, and uncomfortable by the mere fact of a patch being rejected, even if that rejection was both civil and absolutely necessary for the continued robust operation of the Linux kernel. Or someone might claim to feel that way, if they felt that doing so would get their patch accepted. (If this sounds impossible to you, be thankful, but also please understand that the range of human behavior is extremely wide.) In addition, I certainly feel uncomfortable when someone points out a stupid mistake in one of my patches, but that discomfort is my problem, and furthermore encourages me to improve, which is a good thing. For but one example, this discomfort is exactly what motivated me to write the rcutorture test suite. Therefore, although I hope that we all know what is intended by the words “abused”, “threatened”, and “uncomfortable” in that sentence, the fact is that it will never be possible to fully codify the difference between constructive and destructive behavior.

Therefore, the resolution process is quite important:

If so, please contact the Linux Foundation's Technical Advisory Board at <>, or the individual members, and they will work to resolve the issue to the best of their ability. For more information on who is on the Technical Advisory Board and what their role is, please see:

There can be no perfect resolution process, but this one seems to be squarely in the “good enough” category. The timeframes are long enough that people will not be rewarded by complaining to the LF TAB instead of fixing their patches. The composition of the LF TAB, although not perfect, is diverse, consisting of both men and women from multiple countries. The LF TAB appears to be able to manage the inevitable differences of opinion, based on the fact that not all members provided their Acked-by for this Code of Conflict. And finally, the LF TAB is an elected body that has oversight via the LF, so there are feedback mechanisms. Again, this is not perfect, but it is good enough that I am willing to overlook my concerns about the first sentence in the paragraph.

On to the final paragraph:

As a reviewer of code, please strive to keep things civil and focused on the technical issues involved. We are all humans, and frustrations can be high on both sides of the process. Try to keep in mind the immortal words of Bill and Ted, “Be excellent to each other.”

And once again, in a perfect world it would not be necessary to say this. Sadly, we are human beings rather than angels, and so it does appear to be necessary. Then again, if we were all angels, this would be a very boring world.

Or at least that is what I keep telling myself!

March 11, 2015

The First CoreOS Fest

CoreOS Fest 2015

Get ready, CoreOS Fest, our celebration of everything distributed, is right around the corner! Our first CoreOS Fest is happening May 4 and 5, 2015 in San Francisco. You’ll learn more about application containers, container orchestration, clustering, devops security, new Linux, Go and more.

Join us for this two-day event as we talk about the newest in distributed systems technologies and together talk about securing the Internet. Be part of discussions shaping modern infrastructure stacks, hear from peers on how they are using these technologies today and get inspired to learn new ways to speed up your application development process.

Take a journey with us (in space and time) and help contribute to the next generation of infrastructure. The early bird tickets are available until March 31st and are only $199, so snatch one up now before they are gone. After March 31st, tickets will be available for $349. See you in May.

Submit an Abstract

Grab An Early Bird Ticket

If you are interested in sponsoring the event, reach out to and we would be happy to send you the prospectus.

March 10, 2015

Py3progress updated

Another year down!

I've updated the py3progress site with the whole of 2014, and what we have so far in 2015. I'll post a review of the last year like I have before later.

March 09, 2015

Verification Challenge 4: Tiny RCU

The first and second verification challenges were directed to people working on verification tools, and the third challenge was directed at developers. Perhaps you are thinking that it is high time that I stop picking on others and instead direct a challenge at myself. If so, this is the challenge you were looking for!

The challenge is to take the v3.19 Linux kernel code implementing Tiny RCU, unmodified, and use some formal-verification tool to prove that its grace periods are correctly implemented.

This requires a tool that can handle multiple threads. Yes, Tiny RCU runs only on a single CPU, but the proof will require at least two threads. The basic idea is to have one thread update a variable, wait for a grace period, then update a second variable, while another thread accesses both variables within an RCU read-side critical section, and a third parent thread verifies that this critical section did not span a grace period, like this:

 1 int x;
 2 int y;
 3 int r1;
 4 int r2;
 6 void rcu_reader(void)
 7 {
 8   rcu_read_lock();
 9   r1 = x; 
10   r2 = y; 
11   rcu_read_unlock();
12 }
14 void *thread_update(void *arg)
15 {
16   x = 1; 
17   synchronize_rcu();
18   y = 1; 
19 }
21 . . .
23 assert(r2 == 0 || r1 == 1);

Of course, rcu_reader()'s RCU read-side critical section is not allowed to span thread_update()'s grace period, which is provided by synchronize_rcu(). Therefore, rcu_reader() must execute entirely before the end of the grace period (in which case r2 must be zero, keeping in mind C's default initialization to zero), or it must execute entirely after the beginning of the grace period (in which case r1 must be one).

There are a few technical problems to solve:

  1. The Tiny RCU code #includes numerous “interesting” files. I supplied empty files as needed and used “-I .” to focus the C preprocessor's attention on the current directory.

  2. Tiny RCU uses a number of equally interesting Linux-kernel primitives. I stubbed most of these out in fake.h, but copied a number of definitions from the Linux kernel, including IS_ENABLED, barrier(), and bool.

  3. Tiny RCU runs on a single CPU, so the two threads shown above must act as if this was the case. I used pthread_mutex_lock() to provide the needed mutual exclusion, keeping in mind that Tiny RCU is available only with CONFIG_PREEMPT=n. The thread that holds the lock is running on the sole CPU.

  4. The synchronize_rcu() function can block. I modeled this by having it drop the lock and then re-acquire it.

  5. The dyntick-idle subsystem assumes that the boot CPU is born non-idle, but in this case the system starts out idle. After a surprisingly long period of confusion, I handled this by having main() invoke rcu_idle_enter() before spawning the two threads. The confusion eventually proved beneficial, but more on that later.

The first step is to get the code to build and run normally. You can omit this step if you want, but given that compilers usually generate better diagnostics than do the formal-verification tools, it is best to make full use of the compilers.

I first tried goto-cc, goto-instrument, and satabs [Slide 44 of PDF] and impara [Slide 52 of PDF], but both tools objected strenuously to my code. My copies of these two tools are a bit dated, so it is possible that these problems have since been fixed. However, I decided to download version 5 of cbmc, which is said to have gained multithreading support.

After converting my code to a logic expression with no fewer than 109,811 variables and 457,344 clauses, cbmc -I . -DRUN fake.c took a bit more than ten seconds to announce VERIFICATION SUCCESSFUL. But should I trust it? After all, I might have a bug in my scaffolding or there might be a bug in cbmc.

The usual way to check for this is to inject a bug and see if cbmc catches it. I chose to break up the RCU read-side critical section as follows:

 1 void rcu_reader(void)
 2 {
 3   rcu_read_lock();
 4   r1 = x; 
 5   rcu_read_unlock();
 6   cond_resched();
 7   rcu_read_lock();
 8   r2 = y; 
 9   rcu_read_unlock();
10 }

Why not remove thread_update()'s call to synchronize_rcu()? Take a look at Tiny RCU's implementation of synchronize_rcu() to see why not!

With this change enabled via #ifdef statements, “cbmc -I . -DRUN -DFORCE_FAILURE fake.c” took almost 20 seconds to find a counter-example in a logic expression with 185,627 variables and 815,691 clauses. Needless to say, I am glad that I didn't have to manipulate this logic expression by hand!

Because cbmc catches an injected bug and verifies the original code, we have some reason to hope that the VERIFICATION SUCCESSFUL was in fact legitimate. As far as I know, this is the first mechanical proof of the grace-period property of a Linux-kernel RCU implementation, though admittedly of a rather trivial implementation. On the other hand, a mechanical proof of some properties of the dyntick-idle counters came along for the ride, courtesy of the WARN_ON_ONCE() statements in the Linux-kernel source code. (Previously, researchers at Oxford mechanically validated the relationship between rcu_dereference() and rcu_assign_pointer(), taking the whole of Tree RCU as input, and researchers at MPI-SWS formally validated userspace RCU's grace-period guarantee—manually.)

As noted earlier, I had confused myself into thinking that cbmc did not handle pthread_mutex_lock(). I verified that cbmc handles the gcc atomic builtins, but it turns out to be impractical to build a lock for cbmc's use from atomics. The problem stems from the “b” for “bounded” in “cbmc”, which means cbmc cannot analyze the unbounded spin loops used in locking primitives.

However, cbmc does do the equivalent of a full state-space search, which means it will automatically model all possible combinations of lock-acquisition delays even in the absence of a spin loop. This suggests something like the following:

 1 if (__sync_fetch_and_add(&cpu_lock, 1))
 2   exit();

The idea is to exclude from consideration any executions where the lock cannot be immediately acquired, again relying on the fact that cbmc automatically models all possible combinations of delays that the spin loop might have otherwise produced, but without the need for an actual spin loop. This actually works, but my mis-modeling of dynticks fooled me into thinking that it did not. I therefore made lock-acquisition failure set a global variable and added this global variable to all assertions. When this failed, I had sufficient motivation to think, which caused me to find my dynticks mistake. Fixing this mistake fixed all three versions (locking, exit(), and flag).

The exit() and flag approaches result in exactly the same number of variables and clauses, which turns out to be quite a bit fewer than the locking approach:

Verification69,050 variables, 287,548 clauses (output)109,811 variables, 457,344 clauses (output)
Verification Forced Failure113,947 variables, 501,366 clauses (output)   185,627 variables, 815,691 clauses (output)

So locking increases the size of the logic expressions by quite a bit, but interestingly enough does not have much effect on verification time. Nevertheless, these three approaches show a few of the tricks that can be used to accomplish real work using formal verification.

The GPL-licensed source for the Tiny RCU validation may be found here. C-preprocessor macros select the various options, with -DRUN being necessary for both real runs and cbmc verification (as opposed to goto-cc or impara verification), -DCBMC forcing the atomic-and-flag substitute for locking, and -DFORCE_FAILURE forcing the failure case. For example, to run the failure case using the atomic-and-flag approach, use:


Possible next steps include verifying dynticks and interrupts, dynticks and NMIs, and of course use of call_rcu() in place of synchronize_rcu(). If you try these out, please let me know how it goes!

CoreOS on VMware vSphere and VMware vCloud Air

At CoreOS, we want to make the world successful with containers on all computing platforms. Today, we are taking one step closer to that goal by announcing, with VMware, that CoreOS is fully supported and integrated with both VMware vSphere 5.5 and VMware vCloud Air. Enterprises that have been evaluating using containers but needed fully supported environments to begin now have the support to get started.

We’ve worked closely with VMware in enabling CoreOS to run on vSphere 5.5 (see the technical preview of CoreOS on vSphere 5.5). This collaboration extends the security, consistency, and reliability advantages of CoreOS to users of vSphere. Developers can focus on their applications and operations get the control they need. We encourage you to read more from VMware here:

CoreOS Now Supported on VMware vSphere 5.5 and VMware vCloud Air.

As a sysadmin you’ve gotta be thinking, what does this mean for me?

Many people have been running CoreOS on VMware for a while now, but something was missing. Mainly performance and full integration with VMware management APIs. Today that all changes. CoreOS is now shipping open-vm-tools, the open source implementation of VMware Tools, which enables better performance and enables management of CoreOS VMs running in all VMware environments.

Lets take a quick moment to explore some of the things that are now possible.

Taking CoreOS for a spin with VMware Fusion

The following tutorial will walk you through downloading an official CoreOS VMware image and configuring it using a cloud config drive. Once configured, a CoreOS instance will be launched and managed using the vmrun command line tool that ships with VMware Fusion.

To make the following commands easier to run set the following vmrun alias in your shell:

alias vmrun='/Applications/VMware\'

Download a CoreOS VMware Image

First things first, download a CoreOS VMware image and save it to your local machine:

$ mkdir coreos-vmware
$ cd coreos-vmware
$ wget
$ wget

Decompress the VMware disk image:

$ bzip2 -d coreos_production_vmware_image.vmdk.bz2

Configuring a CoreOS VM with a config-drive

By default CoreOS VMware images do not have any users configured, which means you won’t be able to login to your VM after it boots. Also, many of the vmrun guest OS commands require a valid CoreOS username and password.

A config-drive is the best way to configure a CoreOS instance running on VMware. Before you can create a config-drive, you’ll need some user data. For this tutorial you will use a CoreOS cloud-config file as user data to configure users and set the hostname.

Generate the password hash for the core and root users

Before creating the cloud-config file, generate a password hash for the core and root users:

$ openssl passwd -1
Verifying - Password:

Enter vmware at both password prompts.

Create a cloud config file

Now we are ready to create a cloud-config file:

edit cloud-config.yaml


hostname: vmware-guest
  - name: core
    passwd: $1$LEfVXsiG$lhcyOrkJq02jWnEhF93IR/
      - sudo
      - docker
  - name: root
    passwd: $1$LEfVXsiG$lhcyOrkJq02jWnEhF93IR/

Create a config-drive

With your cloud-config file in place you can use it to create a config drive. The easiest way to create a config-drive is to generate an ISO using a cloud-config file and attach it to a VM.

$ mkdir -p /tmp/new-drive/openstack/latest
$ cp cloud-config.yaml /tmp/new-drive/openstack/latest/user_data
$ hdiutil makehybrid -iso -joliet -joliet-volume-name "config-2" -o ~/cloudconfig.iso /tmp/new-drive
$ rm -r /tmp/new-drive

At this point you should have a config-drive named cloudconfig.iso in your home directory.

Attaching a config-drive to a VM

Before booting the CoreOS VM the config-drive must be attached to the VM. Do this by appending the following lines to the coreos_production_vmware.vmx config file:

ide0:0.present = "TRUE"
ide0:0.autodetect = "TRUE"
ide0:0.deviceType = "cdrom-image"
ide0:0.fileName = "/Users/kelseyhightower/cloudconfig.iso"

At this point you are ready to launch the CoreOS VM:

vmrun start coreos_production_vmware.vmx

CoreOS on VMware

Running commands

With the CoreOS VM up and running use the vmrun command line tool to interact with it. Let's start by checking the status of vmware-tools in the VM:

$ vmrun checkToolsState coreos_production_vmware.vmx

Grab the VM’s IP address with the getGuestIPAddress command:

$ vmrun getGuestIPAddress coreos_production_vmware.vmx

Full VMware integration also means you can now run guest OS commands. For example you can list the running processes using the listProcessesInGuest command:

$ vmrun -gu core -gp vmware listProcessesInGuest coreos_production_vmware.vmx
Process list: 63
pid=1, owner=root, cmd=/usr/lib/systemd/systemd --switched-root --system --deserialize 21
pid=2, owner=root, cmd=kthreadd
pid=3, owner=root, cmd=ksoftirqd/0
pid=4, owner=root, cmd=kworker/0:0
pid=5, owner=root, cmd=kworker/0:0H
pid=6, owner=root, cmd=kworker/u2:0

Finally you can now run arbitrary commands and scripts using VMware management tools. For example, use the runProgramInGuest command to initiate a graceful shutdown:

$ vmrun -gu root -gp vmware runProgramInGuest coreos_production_vmware.vmx /usr/sbin/shutdown now

CoreOS on VMware

We have only scratched the surface regarding the number of things you can do with the new VMware powered CoreOS images. Check out the “Using vmrun to Control Virtual Machines” e-book for more details.

CoreOS and VMware going forward

We look forward to continuing on the journey to secure the backend of the Internet by working on all types of platforms in the cloud or behind the firewall. We are continuing to work with VMware so that CoreOS is also supported on the recently announced vSphere 6. If you have any questions in the meantime, you can find us on IRC as you get started. Feedback can also be provided at the VMware / CoreOS community forum.

March Update

It’s been a busy start to the year with lots going on in the Sahana. There’s been some great voluntary contributions over the past months. Tom Baker has been making some great progress extending continuing his work developing a Sahana [Read the Rest...]

March 08, 2015

Technocracy: a short look at the impact of technology on modern political and power structures

Below is an essay I wrote for some study that I thought might be fun to share. If you like this, please see the other blog posts tagged as Gov 2.0. Please note, this is a personal essay and not representative of anyone else :)

In recent centuries we have seen a dramatic change in the world brought about by the rise of and proliferation of modern democracies. This shift in governance structures gives the common individual a specific role in the power structure, and differs sharply from more traditional top down power structures. This change has instilled in many of the world’s population some common assumptions about the roles, responsibilities and rights of citizens and their governing bodies. Though there will always exist a natural tension between those in power and those governed, modern governments are generally expected to be a benevolent and accountable mechanism that balances this tension for the good of the society as a whole.

In recent decades the Internet has rapidly further evolved the expectations and individual capacity of people around the globe through, for the first time in history, the mass distribution of the traditional bastions of power. With a third of the world online and countries starting to enshrine access to the Internet as a human right, individuals have more power than ever before to influence and shape their lives and the lives of people around them. It is easier that ever for people to congregate, albeit virtually, according to common interests and goals, regardless of their location, beliefs, language, culture or other age old barriers to collaboration. This is having a direct and dramatic impact on governments and traditional power structures everywhere, and is both extending and challenging the principles and foundations of democracy.

This short paper outlines how the Internet has empowered individuals in an unprecedented and prolific way, and how this has changed and continues to change the balance of power in societies around the world, including how governments and democracies work.

Democracy and equality

The concept of an individual having any implicit rights or equality isn’t new, let alone the idea that an individual in a society should have some say over the ruling of the society. Indeed the idea of democracy itself has been around since the ancient Greeks in 500 BCE. The basis for modern democracies lies with the Parliament of England in the 11th century at a time when the laws of the Crown largely relied upon the support of the clergy and nobility, and the Great Council was formed for consultation and to gain consent from power brokers. In subsequent centuries, great concerns about leadership and taxes effectively led to a strongly increased role in administrative power and oversight by the parliament rather than the Crown.

The practical basis for modern government structures with elected official had emerged by the 17th century. This idea was already established in England, but also took root in the United States. This was closely followed by multiple suffrage movements from the 19th and 20th centuries which expanded the right to participate in modern democracies from (typically) adult white property owners to almost all adults in those societies.

It is quite astounding to consider the dramatic change from very hierarchical, largely unaccountable and highly centralised power systems to democratic ones in which those in powers are expected to be held to account. This shift from top down power, to distributed, representative and accountable power is an important step to understand modern expectations.

Democracy itself is sustainable only when the key principle of equality is deeply ingrained in the population at large. This principle has been largely infused into Western culture and democracies, independent of religion, including in largely secular and multicultural democracies such as Australia. This is important because an assumption of equality underpins stability in a system that puts into the hands of its citizens the ability to make a decision. If one component of the society feels another doesn’t have an equal right to a vote, then outcomes other than their own are not accepted as legitimate. This has been an ongoing challenge in some parts of the world more than others.

In many ways there is a huge gap between the fearful sentiments of Thomas Hobbes, who preferred a complete and powerful authority to keep the supposed ‘brutish nature’ of mankind at bay, and the aspirations of John Locke who felt that even governments should be held to account and the role of the government was to secure the natural rights of the individual to life, liberty and property. Yet both of these men and indeed, many political theorists over many years, have started from a premise that all men are equal – either equally capable of taking from and harming others, or equal with regards to their individual rights.

Arguably, the Western notion of individual rights is rooted in religion. The Christian idea that all men are created equal under a deity presents an interesting contrast to traditional power structures that assume one person, family or group have more rights than the rest, although ironically various churches have not treated all people equally either. Christianity has deeply influenced many political thinkers and the forming of modern democracies, many of which which look very similar to the mixed regime system described by Saint Thomas Aquinas in his Summa Thelogiae essays:

Some, indeed, say that the best constitution is a combination of all existing forms, and they praise the Lacedemonian because it is made up of oligarchy, monarchy, and democracy, the king forming the monarchy, and the council of elders the oligarchy, while the democratic element is represented by the Ephors: for the Ephors are selected from the people.

The assumption of equality has been enshrined in key influential documents including the United States Declaration of Independence, 1776:

We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.

More recently in the 20th Century, the Universal Declaration of Human Rights goes even further to define and enshrine equality and rights, marking them as important for the entire society:

Whereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world… – 1st sentence of the Preamble to the Universal Declaration of Human Rights

All human beings are born free and equal in dignity and rights. – Article 1 of the United Nations Universal Declaration of Human Rights (UDHR)

The evolution of the concepts of equality and “rights” is important to understand as they provide the basis for how the Internet is having such a disruptive impact on traditional power structures, whilst also being a natural extension of an evolution in human thinking that has been hundreds of years in the making.

Great expectations

Although only a third of the world is online, in many countries this means the vast bulk of the population. In Australia over 88% of households are online as of 2012. Constant online access starts to drive a series of new expectations and behaviours in a community, especially one where equality has already been so deeply ingrained as a basic principle.

Over time a series of Internet-based instincts and perspectives have become mainstream, arguably driven by the very nature of the technology and the tools that we use online. For example, the Internet was developed to “route around damage” which means the technology can withstand technical interruption by another hardware or software means. Where damage is interpreted in a social sense, such as perhaps censorship or locking away access to knowledge, individuals instinctively seek and develop a work around and you see something quite profound. A society has emerged that doesn’t blindly accept limitations put upon them. This is quite a challenge for traditional power structures.

The Internet has become both an extension and an enabler of equality and power by massively distributing both to ordinary people around the world. How has power and equality been distributed? When you consider what constitutes power, four elements come to mind: publishing, communications, monitoring and enforcement.

Publishing – in times gone past the ideas that spread beyond a small geographical area either traveled word of mouth via trade routes, or made it into a book. Only the wealthy could afford to print and distribute the written word, so publishing and dissemination of information was a power limited to a small number of people. Today the spreading of ideas is extremely easy, cheap and can be done anonymously. Anyone can start a blog, use social media, and the proliferation of information creation and dissemination is unprecedented. How does this change society? Firstly there is an assumption that an individual can tell their story to a global audience, which means an official story is easily challenged not only by the intended audience, but by the people about whom the story is written. Individuals online expect both to have their say, and to find multiple perspectives that they can weigh up, and determine for themselves what is most credible. This presents significant challenges to traditional powers such as governments in establishing an authoritative voice unless they can establish trust with the citizens they serve.

Communications– individuals have always had some method to communicate with individuals in other communities and countries, but up until recent decades these methods have been quite expensive, slow and oftentimes controlled. This has meant that historically, people have tended to form social and professional relationships with those close by, largely out of convenience. The Internet has made it easy to communicate, collaborate with, and coordinate with individuals and groups all around the world, in real time. This has made massive and global civil responses and movements possible, which has challenged traditional and geographically defined powers substantially. It has also presented a significant challenge for governments to predict and control information flow and relationships within the society. It also created a challenge for how to support the best interests of citizens, given the tension between what is good for a geographically defined nation state doesn’t always align with what is good for an online and trans-nationally focused citizen.

Monitoring – traditional power structures have always had ways to monitor the masses. Monitoring helps maintain rule of law through assisting in the enforcement of laws, and is often upheld through self-reporting as those affected by broken laws will report issues to hold detractors to account. In just the last 50 years, modern technologies like CCTV have made monitoring of the people a trivial task, where video cameras can record what is happening 24 hours a day. Foucault spoke of the panopticon gaol design as a metaphor for a modern surveillance state, where everyone is constantly watched on camera. The panopticon was a gaol design wherein detainees could not tell if they were being observed by gaolers or not, enabling in principle, less gaolers to control a large number of prisoners. In the same way prisoners would theoretically behave better under observation, Foucault was concerned that omnipresent surveillance would lead to all individuals being more conservative and limited in themselves if they knew they could be watched at any time. The Internet has turned this model on its head. Although governments can more easily monitor citizens than ever before, individuals can also monitor each other and indeed, monitor governments for misbehaviour. This has led to individuals, governments, companies and other entities all being held to account publicly, sometimes violently or unfairly so.

Enforcement – enforcement of laws are a key role of a power structure, to ensure the rules of a society are maintained for the benefit of stability and prosperity. Enforcement can take many forms including physical (gaol, punishment) or psychological (pressure, public humiliation). Power structures have many ways of enforcing the rules of a society on individuals, but the Internet gives individuals substantial enforcement tools of their own. Power used to be who had the biggest sword, or gun, or police force. Now that major powers and indeed, economies, rely so heavily upon the Internet, there is a power in the ability to disrupt communications. In taking down a government or corporate website or online service, an individual or small group of individuals can have an impact far greater than in the past on power structures in their society, and can do so anonymously. This becomes quite profound as citizen groups can emerge with their own philosophical premise and the tools to monitor and enforce their perspective.

Property – property has always been a strong basis of law and order and still plays an important part in democracy, though perspectives towards property are arguably starting to shift. Copyright was invented to protect the “intellectual property” of a person against copying at a time when copying was quite a physical business, and when the mode of distributing information was very expensive. Now, digital information is so easy to copy that it has created a change in expectations and a struggle for traditional models of intellectual property. New models of copyright have emerged that explicitly support copying (copyleft) and some have been successful, such as with the Open Source software industry or with remix music culture. 3D printing will change the game again as we will see in the near future the massive distribution of the ability to copy physical goods, not just virtual ones. This is already creating havoc with those who seek to protect traditional approaches to property but it also presents an extraordinary opportunity for mankind to have greater distribution of physical wealth, not just virtual wealth. Particularly if you consider the current use of 3D printing to create transplant organs, or the potential of 3D printing combined with some form of nano technology that could reassemble matter into food or other essential living items. That is starting to step into science fiction, but we should consider the broader potential of these new technologies before we decide to arbitrarily limit them based on traditional views of copyright, as we are already starting to see.

By massively distributing publishing, communications, monitoring and enforcement, and with the coming potential massive distribution of property, technology and the Internet has created an ad hoc, self-determined and grassroots power base that challenges traditional power structures and governments.

With great power…

Individuals online find themselves more empowered and self-determined than ever before, regardless of the socio-political nature of their circumstances. They can share and seek information directly from other individuals, bypassing traditional gatekeepers of knowledge. They can coordinate with like-minded citizens both nationally and internationally and establish communities of interest that transcend geo-politics. They can monitor elected officials, bureaucrats, companies and other individuals, and even hold them all to account.

To leverage these opportunities fully requires a reasonable amount of technical literacy. As such, many technologists are on the front line, playing a special role in supporting, challenging and sometimes overthrowing modern power structures. As technical literacy is permeating mainstream culture more individuals are able to leverage these disrupters, but technologist activists are often the most effective at disrupting power through the use of technology and the Internet.

Of course, whilst the Internet is a threat to traditional centralised power structures, it also presents an unprecedented opportunity to leverage the skills, knowledge and efforts of an entire society in the running of government, for the benefit of all. Citizen engagement in democracy and government beyond the ballot box presents the ability to co-develop, or co-design the future of the society, including the services and rules that support stability and prosperity. Arguably, citizen buy-in and support is now an important part of the stability of a society and success of a policy.

Disrupting the status quo

The combination of improved capacity for self-determination by individuals along with the increasingly pervasive assumptions of equality and rights have led to many examples of traditional power structures being held to account, challenged, and in some cases, overthrown.

Governments are able to be held more strongly to account than ever before. The Open Australia Foundation is a small group of technologists in Australia who create tools to improve transparency and citizen engagement in the Australian democracy. They created Open Australia, a site that made the public parliamentary record more accessible to individuals through making it searchable, subscribable and easy to browse and comment on. They also have projects such as Planning Alerts which notifies citizens of planned development in their area, Election Leaflets where citizens upload political pamphlets for public record and accountability, and Right to Know, a site to assist the general public in pursuing information and public records from the government under Freedom of Information. These are all projects that monitor, engage and inform citizens about government.

Wikileaks is a website and organisation that provides an anonymous way for individuals to anonymously leak sensitive information, often classified government information. Key examples include video and documents from the Iraq and Afghanistan wars, about the Guantanamo Bay detention camp, United States diplomatic cables and million of emails from Syrian political and corporate figures. Some of the information revealed by Wikileaks has had quite dramatic consequences with the media and citizens around the world responding to the information. Arguably, many of the Arab Spring uprisings throughout the Middle East from December 2010 were provoked by the release of the US diplomatic cables by Wikileaks, as it demonstrated very clearly the level of corruption in many countries. The Internet also played a vital part in many of these uprisings, some of which saw governments deposed, as social media tools such as Twitter and Facebook provided the mechanism for massive coordination of protests, but importantly also provided a way to get citizen coverage of the protests and police/army brutality, creating global audience, commentary and pressure on the governments and support for the protesters.

Citizen journalism is an interesting challenge to governments because the route to communicate with the general public has traditionally been through the media. The media has presented for many years a reasonably predictable mechanism for governments to communicate an official statement and shape public narrative. But the Internet has facilitated any individual to publish online to a global audience, and this has resulted in a much more robust exchange of ideas and less clear cut public narrative about any particular issue, sometimes directly challenging official statements. A particularly interesting case of this was the Salam Pax blog during the 2003 Iraq invasion by the United States. Official news from the US would largely talk about the success of the campaign to overthrown Suddam Hussein. The Salam Pax blog provided the view of a 29 year old educated Iraqi architect living in Baghdad and experiencing the invasion as a citizen, which contrasted quite significantly at times with official US Government reports. This type of contrast will continue to be a challenge to governments.

On the flip side, the Internet has also provided new ways for governments themselves to support and engage citizens. There has been the growth of a global open government movement, where governments themselves try to improve transparency, public engagement and services delivery using the Internet. Open data is a good example of this, with governments going above and beyond traditional freedom of information obligations to proactively release raw data online for public scrutiny. Digital services allow citizens to interact with their government online rather than the inconvenience of having to physically attend a shopfront. Many governments around the world are making public commitments to improving the transparency, engagement and services for their citizens. We now also see more politicians and bureaucrats engaging directly with citizens online through the use of social media, blogs and sophisticated public consultations tools. Governments have become, in short, more engaged, more responsive and more accountable to more people than ever before.


Only in recent centuries have power structures emerged with a specific role for common individual citizens. The relationship between individuals and power structures has long been about the balance between what the power could enforce and what the population would accept. With the emergence of power structures that support and enshrine the principles of equality and human rights, individuals around the world have come to expect the capacity to determine their own future. The growth of and proliferation of democracy has been a key shift in how individuals relate to power and governance structures.

New technologies and the Internet has gone on to massively distribute the traditionally centralised powers of publishing, communications, monitoring and enforcement (with property on the way). This distribution of power through the means of technology has seen democracy evolve into something of a technocracy, a system which has effectively tipped the balance of power from institutions to individuals.


Hobbes, T. The Leviathan, ed. by R. Tuck, Cambridge University Press, 1991.

Aquinas, T. Sum. Theol. i-ii. 105. 1, trans. A. C. Pegis, Whether the old law enjoined fitting precepts concerning rulers?

Uzgalis, William, “John Locke”, The Stanford Encyclopedia of Philosophy (Fall 2012 Edition), Edward N. Zalta (ed.),

See additional useful references linked throughout essay.

March 05, 2015

Managing CoreOS Logs with Logentries

Today Logentries announced a CoreOS integration, so CoreOS users can get a a deeper understanding into their CoreOS environments. The new integration enables CoreOS users to easily send logs using the Journal logging system, part of CoreOS’ Systemd process manager, directly into Logentries for real-time monitoring, alerting, and data visualization. This is the first CoreOS log management integration.

To learn more about centralizing logs from CoreOS clusters read Trevor Parsons, co-founder and chief scientist at Logentires post. Or get started by following the documentation here.