Planet linux.conf.au
Celebrating the wonderful linux.conf.au 2015 conference...

August 25, 2015

Containers on the Autobahn: Q&A; with Giant Swarm

Timo Derstappen (@teemow), co-founder from Giant Swarm has joined us for various events in the past, but you may recall seeing his talk at CoreOS Fest this year (embedded below). We sat down with him to see what Giant Swarm is up to and how Giant Swarm uses CoreOS for their microservice infrastructure.


Q1. Explain what Giant Swarm delivers and what inspired you to co-found the company.

A: Giant Swarm is a Microservice Infrastructure. Given the fact that the term Microservices is used by a lot of people lately, I’m going to explain this a little bit more.

At my last company we grew pretty quickly and after a rush of feature implementations we stood there with a monolithic app that now had to be scaled. We looked closely at the different requirements within the stack and decided that we would prefer to choose the right tool for each job. This was the complete opposite to what we did before, where we had one techstack and tried to solve everything with it. By isolating problems in small services we were able to scale in many dimensions. Teams weren’t blocking each other, services could be scaled independently, and we iterated faster. It was also very expensive in terms of automating the infrastructure to run the zoo of technologies we were suddenly using. 20-30% of the developers were always blocked by automating the infrastructure. After leaving the company we took some time off and I worked on a next generation platform I wanted to use for our next idea. Wherever I demoed that, people either wanted to have that too or work at our company.

So the infrastructure itself became the next idea. We now run that infrastructure for many developers, first customers are going into production with their own dedicated clusters, and we now also offer on-prem installations.

Q2. How does Giant Swarm fit into the world of distributed systems and containers?

A: Giant Swarm builds a layer on top of containers and enables developers to declare and manage their microservices without thinking about servers. We map their software architecture onto a container infrastructure distributed across many servers. Our product clearly addresses developers. We enable them to actually live up to “You build it, you run it” without the hassle to actually learn how to setup a production ready container infrastructure with networking and storage solutions that fit in such a highly dynamic environment.

Q3. Your talk at CoreOS Fest on Containers on the Autobahn discussed what fast means in the world of containers and distributed systems. Explain what is most important to do when looking to develop and deploy application containers in an efficient and fast way.

A: In my talk I actually not only showed how our users can run their services on our platform, but I also showed how we ourselves are dogfooding by running our own services in containers with the same building blocks we are providing our users with. There is a saying that a good manager is a barrier removal professional, and we are thinking the same way about infrastructure. Good infrastructure should allow developers to run at full speed, not being encumbered by roadblocks. For instance, you want to create a new test environment for your service landscape in a couple of seconds instead of waiting for the other team that is currently blocking the test environment.

In general there are many facets of fast, which we are addressing at Giant Swarm. Low latency, Short MTTR, High Throughput. Which leads to another part of the talk. Although Giant Swarm appears to be a PaaS-like solution to start your application in containers with zero-configuration, we provide you with a container infrastructure that is unopinionated. On your own dedicated cluster you can choose if you’d like to run on AWS or bare metal and which networking/storage fits you best. You can run Kubernetes on it, run your own service discovery, continuous integration, monitoring, etc. There is also a benefit that users can share their infrastructure stacks and try out new ones really quickly.

Q4. How does Giant Swarm make use of CoreOS projects, such as CoreOS Linux and rkt, on your microservice infrastructure? Any tips and tricks you’d recommend for others, or areas where readers can get more information?

A: The whole Giant Swarm infrastructure is based on CoreOS. A small stripped down modern Linux with atomic updates was exactly what I was looking for in a production environment. Even better is that CoreOS follows the Unix philosophy and builds small but capable tools. This enables platform builders like us to provide customers with a flexible solution by combining these tools with other building blocks that cater to the customers’ needs. We also excessively use systemd for our container scheduling and management. Something that might be a bit unique to what we do is that we build container “chains” around each application container to keep the configuration out of the actual application container. The concept is similar to pods, but our chains start up in order and you can create blocking dependencies. This uses a distributed lock to wait for a dependency on startup, which allows us to start and stop complex architectures very gracefully.

Currently, we are using Docker containers to not break dev/prod parity for our users, but we very much favor the concept of a container runtime like rkt, based on the same reasons we like the Unix-like approach of CoreOS.

Q5. Designing an application as a series of micro-services has moved from being an emerging technology to an accepted design pattern. Do you have any suggestions on what enterprise organizations can do to speed the adoption of this new pattern?

A: Moving to microservices is not easy and is even harder for a big enterprise with lots of legacy software and more traditional, late adopting technical staff. There are two categories of hurdles to take with microservice architectures.

First are the hurdles that come with any new architectural or software engineering pattern. That includes questions around what actually microservices are, how they should be cut, if they should contain data or not, and all kinds of new ways of thinking that developers might not be used to, yet. However, there’s more and more articles and even books as well as good consultants that help companies understand and move to the microservices way. In the end every enterprise has to design its migration according to its individual needs - for some starting with a monolith and breaking away smaller services might be a good choice, for others the complete rebuild of (parts) of their systems in microservices style. We have seen both with customers going into production.

The second category of hurdles revolves around the (operations) overhead that comes with deploying and managing microservices. Here’s where container technologies, CoreOS, and Giant Swarm come into play as we all are actively working on solutions to make the development and operations part of microservices a simple and hassle free experience. Using tools that make the first steps towards microservices easier for developers as well as operations teams, make it easier to bring the enterprise to this new pattern. These tools should get out of the way of the users and enable them to focus on the actual implementation details of their microservices instead of having to worry about how to run them on different environments.

Thanks to Timo for chatting with us!


Watch his talk that was given at CoreOS Fest this year.

August 24, 2015

Docker on Windows Server Preview TP3 with wifi

Doesn’t work. Especially if, like me, you have a docking station usb 3 ethernet, an on-board ethernet, use wifi on many different access-points, and use your mobile phone for network connectivity.

The Docker daemon is started by running

net start docker

, which runs

C:\ProgramData\docker\runDockerDaemon.cmd

.

In that script, you’ll see the “virtual switch” (

docker daemon -D -b "Virtual Switch"

) is used for networking – and that (at least in my case) appears to be bound to the ethernet I had when I installed.

Same pain point as trying to use Hyper-V VM’s for roaming development.

Uninstalling Hyper-V leaves us in an interesting place:

ending build context to Docker daemon 2.048 kB
Step 0 : FROM windowsservercore
 ---> 0d53944cb84d
Step 1 : RUN @powershell -NoProfile -ExecutionPolicy Bypass -Command "iex ((new-object net.webclient).DownloadString('https://chocolatey.org/install.ps1'))"
 ---> Running in ad8fb58ba732
HCSShim::CreateComputeSystem - Win32 API call returned error r1=3224830464 err=A virtual switch with the given name was not found. id=ad8fb58ba732880aaace7b4e3288212aa9493083848cf0324de310520b523d21 configuration={"SystemType":"Container","Name":"ad8fb58ba732880aaace7b4e3288212aa9493083848cf0324de310520b523d21","Owner":"docker","IsDummy":false,"VolumePath":"\\\\?\\Volume{63828c05-49f4-11e5-89c2-005056c00008}","Devices":[{"DeviceType":"Network","Connection":{"NetworkName":"Virtual Switch","EnableNat":false,"Nat":{"Name":"ContainerNAT","PortBindings":null}},"Settings":null}],"IgnoreFlushesDuringBoot":true,"LayerFolderPath":"C:\\ProgramData\\docker\\windowsfilter\\ad8fb58ba732880aaace7b4e3288212aa9493083848cf0324de310520b523d21","Layers":[{"ID":"f0d4aaa3-c43d-59c1-8ad0-44e6b3381efc","Path":"C:\\ProgramData\\Microsoft\\Windows\\Images\\CN=Microsoft_WindowsServerCore_10.0.10514.0"}]}

looks like the virtual switch made for containers was removed at some point (might have been when I installed Hyper-V, I’m not sure)

Running

Get-VMSwitch

returns nothing.

So I installed VMWare Workstation and made a Boot2Docker VM with both NAT and private networking – both vmware based virtual networks continue to work when moving between wifi and ethernet.

So lets see if we can make one in powershell, using the VMWare NAT adaptor (see http://blogs.technet.com/b/heyscriptingguy/archive/2013/10/09/use-powershell-to-create-virtual-switches.aspx)

PS C:\Users\sven\src\WindowsDocker> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
VMware Network Adapte...8 VMware Virtual Ethernet Adapter for ...      28 Up           00-50-56-C0-00-08       100 Mbps
VMware Network Adapte...1 VMware Virtual Ethernet Adapter for ...      27 Up           00-50-56-C0-00-01       100 Mbps
Wi-Fi                     Intel(R) Dual Band Wireless-AC 7260           4 Disabled     5C-51-4F-BA-12-6F          0 bps
Ethernet                  Intel(R) Ethernet Connection I218-LM          3 Up           28-D2-44-4D-B6-64         1 Gbps


VMWare helpfully provides a Virtual Network editor, so I can see that "Get-NetAdapter  -Name "VMware Network Adapter VMnet8" is the NAT one. I'm not sure if creating a Hyper-V External vswitch will make exclusive use of the adaptor, but if so, we can always create another :)

PS C:\Users\sven\src\WindowsDocker> New-VMSwitch  -Name "VMwareNat" -NetAdapterName "VMware Network Adapter VMnet8" -AllowManagementOS $true -Notes "Use VMnet8 to create a roamable Docker daemon network"

Name      SwitchType NetAdapterInterfaceDescription
----      ---------- ------------------------------
VMwareNat External   VMware Virtual Ethernet Adapter for VMnet8

now to edit the runDockerDaemon.cmd, and restart the Docker Daemon.

FAIL. the docker containers still have no network. At this point, I'm not sure if I've totally broken my Windows Docker networking, hopefully some more playing later will turn up something.

Playing some more, there seems to be a new switchtype Nat - see https://raw.githubusercontent.com/Microsoft/Virtualization-Documentation/master/windows-server-container-tools/Install-ContainerHost/Install-ContainerHost.ps1

So re-running the command they use when installing gets us something new to try:

PS C:\Users\sven\src\WindowsDocker> new-vmswitch -Name nat -SwitchType NAT -NatSubnetAddress "172.16.0.0/12"

Name SwitchType NetAdapterInterfaceDescription
---- ---------- ------------------------------
nat  NAT


PS C:\Users\sven\src\WindowsDocker> Get-VMSwitch

Name      SwitchType NetAdapterInterfaceDescription
----      ---------- ------------------------------
VMwareNat External   VMware Virtual Ethernet Adapter for VMnet8
nat       NAT

it works when the ethernet is plugged in, but not on wifi.

yup - bleeding edge dev :)

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

August 21, 2015

What it’s like to Intern with CoreOS

We’ve been very fortunate to have three incredible interns join us for the summer months – Sara and Ahmad at our San Francisco headquarters, and Quentin in our New York City office. Over the last 10 weeks, they’ve not only become integral contributors to our ever-evolving open source projects, but they’ve also become a part of the CoreOS family.

The Intern Program

Interns with CoreOS have the opportunity to work in a fast-paced environment that is shaping the future of infrastructure based on containers and distributed systems. Every intern works closely with a senior level employee that serves as their mentor and project team lead. With their guidance, our interns immediately begin contributing in ways that are not only meaningful to their overall career goals, but that are actively used by the CoreOS community – whether that be through open source or our proprietary products. This unique opportunity allows our interns to receive feedback from their mentors and the greater open source ecosystem. At CoreOS, our interns are regarded as full employees and participate in all company activities, from small team meetings, to all-hands meetings, to off-site adventures.

The 2015 Interns

This year’s interns came with diverse backgrounds and worked on different projects at CoreOS.

  • Ahmad (@Mohd_Ahmad17) is currently pursuing a doctorate in computer science at University of Illinois Urbana-Champaign (UIUC) working on system challenges with a focus on networking. While at CoreOS he worked on flannel, a virtual network for containers. You might recall seeing his blog post that introduced flannel 0.5.0 with AWS and GCE.
  • Quentin studied at Ecole Polytechnique de Tours in France. He is currently working on an independent security project with the Quay.io team in NYC.
  • Sara (@qpezgi) is completing her bachelor’s degree in electrical and computer engineering at University of Illinois Urbana-Champaign (UIUC). She’s currently working on our OS team where she focuses on the loop device management utility.

Intern Week

We took the opportunity to honor our interns and thank them for all their hard work with the first-ever CoreOS Intern Week!

Quentin traveled to the CoreOS headquarters in SF on Monday morning and festivities were underway almost immediately. We kicked off intern week with a team lunch at one of our favorite local Thai food spots. Food, as it is customary in SF, played a big role in the week’s events. Later that week we also went to a BBQ joint, which has since been dubbed by Quentin as, “the best meal he’s had in America.”

CoreOS 2015 Interns Lunch

CoreOS 2015 Interns Lunch

Eating wasn’t all we did during Intern Week. Tuesday’s trip to the Peninsula and South Bay included a drive through the Google and Apple campuses, followed by an exclusive tour of a state-of-the-art data center. While we shopped for potential cabinet space, Sara, Ahmad and Quentin got to walk among enormous data halls, learn about cutting-edge data center design, and better understand where the world’s data “lives.”

After returning to SF, we decompressed in true CoreOS fashion – outdoor ping pong!

The culminating celebration of Intern Week was spent at the Academy of Sciences on Thursday, for NightLife. After a VIP cocktail hour and tour, we visited exhibits with live animals and attended a show at the planetarium. As a majority of the San Francisco team attended, it was an incredible showing of thanks to the interns for their time at CoreOS!

CoreOS 2015 Interns at Nightlife

Nightlife at the California Academy of Sciences

Could you be a future intern?

Every summer, thousands of students dedicate their time to internships. Many of them have the opportunity to work with big tech companies, like Apple, Google and Amazon. But a few lucky individuals take the path less traveled, and spend their time with a growing company like CoreOS. Our interns are an integral part of our company. They see their impact directly in the work they produce and in the projects to which they contribute. They are supported by their project team leads on a daily basis and form meaningful relationships with us all – including our executive team.

“My favorite thing about interning at CoreOS is the sheer vastness of topics I get to work on. I'm not confined or restricted at all when it comes to how I can contribute, and I’ve found I can help in a lot of ways. For instance, I reproduce user-reported bugs in CoreOS, and I also get to assist open source community members how to use CoreOS products and understand all the use cases of software I’m developing. I get to do literally everything.” - Sara

Are you looking to take the path less traveled? Are you passionate about open source and seeing your work make an impact? Then, reach out to us! Send your resume and cover letter to intern@coreos.com.

Docker on Windows Server 2016 tech preview 3

First thing is to install Windows 2016 – I started in a VM, but I’m rapidly thinking i might try it on my notebook – Windows 10 is getting old already :)

Then goto https://msdn.microsoft.com/virtualization/windowscontainers/quick_start/inplace_setup . Note that the powershell script will download another 3GB.

Windows-system32-docker

And now – you can run `docker info` from either cmd.exe, or powershell.

There’s only a limited set of images you can download from Microsoft – `docker search` seems to always reply with the same set:

PS C:\Users\Administrator> docker search anything
NAME DESCRIPTION STARS OFFICIAL AUTOMATED
microsoft/iis Internet Information Services (IIS) instal... 1 [OK] [OK]
microsoft/dnx-clr .NET Execution Environment (DNX) installed... 1 [OK] [OK]
microsoft/ruby Ruby installed in a Windows Server Contain... 1 [OK]
microsoft/rubyonrails Ruby on Rails installed in a Windows Serve... 1 [OK]
microsoft/python Python installed in a Windows Server Conta... 1 [OK]
microsoft/go Go Programming Language installed in a Win... 1 [OK]
microsoft/mongodb MongoDB installed in a Windows Server Cont... 1 [OK]
microsoft/redis Redis installed in a Windows Server Contai... 1 [OK]
microsoft/sqlite SQLite installed in a Windows Server Conta... 1 [OK]

I downloaded two, and this shows’s they’re re-using the `windowsservercore` image as their common base image:

PS C:\Users\Administrator> docker images -a
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
microsoft/go latest 33cac80f92ea 2 days ago 10.09 GB
  8daec63ffb52 2 days ago 9.75 GB
  fbab9eccc1e7 2 days ago 9.697 GB
microsoft/dnx-clr latest 156a0b59c5a8 2 days ago 9.712 GB
  28473be483a9 2 days ago 9.707 GB
  56b7e372f76a 2 days ago 9.697 GB
windowsservercore 10.0.10514.0 0d53944cb84d 6 days ago 9.697 GB
windowsservercore latest 0d53944cb84d 6 days ago 9.697 GB

PS C:\Users\Administrator> docker history microsoft/dnx-clr
IMAGE CREATED CREATED BY SIZE COMMENT
156a0b59c5a8 2 days ago cmd /S /C setx PATH "%PATH%;C:\dnx-clr-win-x6 5.558 MB
28473be483a9 2 days ago cmd /S /C REM (nop) ADD dir:729777dc7e07ff03f 9.962 MB
56b7e372f76a 2 days ago cmd /S /C REM (nop) LABEL Description=.NET Ex 41.41 kB
0d53944cb84d 6 days ago 9.697 GB
PS C:\Users\Administrator> docker history microsoft/go
IMAGE CREATED CREATED BY SIZE COMMENT
33cac80f92ea 2 days ago cmd /S /C C:\build\install.cmd 335 MB
8daec63ffb52 2 days ago cmd /S /C REM (nop) ADD dir:898a4194b45d1cc66 53.7 MB
fbab9eccc1e7 2 days ago cmd /S /C REM (nop) LABEL Description=GO Prog 41.41 kB
0d53944cb84d 6 days ago 9.697 GB

And so the fun begins.

PS C:\Users\Administrator> docker run --rm -it windowsservercore cmd

gives you a containerized shell.

Lets try to build an image that has the chocolatey installer:

FROM windowsservercore

RUN @powershell -NoProfile -ExecutionPolicy Bypass -Command "iex ((new-object net.webclient).DownloadString('https://chocolatey.org/install.ps1'))"

CMD powershell

and then use that image to install…. vim

FROM chocolatey

RUN choco install -y vim

It works!

 docker run --rm -it vim cmd

and then run

C:\Program Files (x86)\vim\vim74\vim.exe

Its not currently usable, I suspect because the ANSI terminal driver is really really new code – but BOOM!

I haven’t worked out how to get the Dockerfile

CMD

or

ENTRYPOINT

to work with paths that have spaces – it doesn’t seem to support the array form yet…

I’m going to keep playing, and put the Dockerfiles into https://github.com/SvenDowideit/WindowsDocker

Don’t forget to read the documentation at https://msdn.microsoft.com/en-us/virtualization/windowscontainers/containers_welcome

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

August 19, 2015

The Purpose of a Code of Conduct

On a private mailing list there have been some recent discussions about a Code of Conduct which demonstrate some great misunderstandings. The misunderstandings don’t seem particular to that list so it’s worthy of a blog post. Also people tend to think more about what they do when their actions will be exposed to a wider audience so hopefully people who read this post will think before they respond.

Jokes

The first discussion concerned the issue of making “jokes”. When dealing with the treatment of other people (particularly minority groups) the issue of “jokes” is a common one. It’s fairly common for people in positions of power to make “jokes” about people with less power and then complain if someone disapproves. The more extreme examples of this concern hate words which are strongly associated with violence, one of the most common is a word used to describe gay men which has often been associated with significant violence and murder. Men who are straight and who conform to the stereotypes of straight men don’t have much to fear from that word while men who aren’t straight will associate it with a death threat and tend not to find any amusement in it.

Most minority groups have words that are known to be associated with hate crimes. When such words are used they usually send a signal that the minority groups in question aren’t welcome. The exception is when the words are used by other members of the group in question. For example if I was walking past a biker bar and heard someone call out “geek” or “nerd” I would be a little nervous (even though geeks/nerds have faced much less violence than most minority groups). But at a Linux conference my reaction would be very different. As a general rule you shouldn’t use any word that has a history of being used to attack any minority group other than one that you are a member of, so black rappers get to use a word that was historically used by white slave-owners but because I’m white I don’t get to sing along to their music. As an aside we had a discussion about such rap lyrics on the Linux Users of Victoria mailing list some time ago, hopefully most people think I’m stating the obvious here but some people need a clear explanation.

One thing that people should consider “jokes” is the issue of punching-down vs punching-up [1] (there are many posts about this topic, I linked to the first Google hit which seems quite good). The basic concept is that making jokes about more powerful people or organisations is brave while making “jokes” about less powerful people is cowardly and serves to continue the exclusion of marginalised people. When I raised this issue in the mailing list discussion a group of men immediately complained that they might be bullied by lots of less powerful people making jokes about them. One problem here is that powerful people tend to be very thin skinned due to the fact that people are usually nice to them. While the imaginary scenario of less powerful people making jokes about rich white men might be unpleasant if it happened in person, it wouldn’t compare to the experience of less powerful people who are the target of repeated “jokes” in addition to all manner of other bad treatment. Another problem is that the impact of a joke depends on the power of the person who makes it, EG if your boss makes a “joke” about you then you have to work on your CV, if a colleague or subordinate makes a joke then you can often ignore it.

Who does a Code of Conduct Protect

One member of the mailing list wrote a long and very earnest message about his belief that the CoC was designed to protect him from off-topic discussions. He analysed the results of a CoC on that basis and determined that it had failed due to the number of off-topic messages on the mailing lists he subscribes to. Being so self-centered is strongly correlated with being in a position of power, he seems to sincerely believe that everything should be about him, that he is entitled to all manner of protection and that any rule which doesn’t protect him is worthless.

I believe that the purpose of all laws and regulations should be to protect those who are less powerful, the more powerful people can usually protect themselves. The benefit that powerful people receive from being part of a system that is based on rules is that organisations (clubs, societies, companies, governments, etc) can become larger and achieve greater things if people can trust in the system. When minority groups are discouraged from contributing and when people need to be concerned about protecting themselves from attack the scope of an organisation is reduced. When there is a certain minimum standard of treatment that people can expect then they will be more willing to contribute and more able to concentrate on their contributions when they don’t expect to be attacked.

The Public Interest

When an organisation declares itself to be acting in the public interest (EG by including “Public Interest” in the name of the organisation) I think that we should expect even better treatment of minority groups. One might argue that a corporation should protect members of minority groups for the sole purpose of making more money (it has been proven that more diverse groups produce better quality work). But an organisation that’s in the “Public Interest” should be expected to go way beyond that and protect members of minority groups as a matter of principle.

When an organisation is declared to be operating in the “Public Interest” I believe that anyone who’s so unable to control their bigotry that they can’t refrain from being bigoted on the mailing lists should not be a member.

August 18, 2015

Using Virtual Machines to Improve Container Security with rkt v0.8.0

Today we are releasing rkt v0.8.0. rkt is an application container runtime built to be efficient, secure and composable for production environments.

This release includes new security features, including initial support for user namespaces and enhanced container isolation using hardware virtualization. We have also introduced a number of improvements such as host journal integration, container socket activation, improved image caching, and speed enhancements.

Intel Contributes rkt stage1 with Virtualization

Intel and rkt

The modular design of rkt enables different execution engines and containerization systems to be built and plugged in. This is achieved using a staged architecture, where the second stage ("stage1") is responsible for creating and launching the container. When we launched rkt, it featured a single, default stage1 which leverages Linux cgroups and namespaces (a combination commonly referred to as "Linux containers").

With the help of engineers at Intel, we have added a new rkt stage1 runtime that utilizes virtualization technology. This means an application running under rkt using this new stage1 can be isolated from the host kernel using the same hardware features that are used in hypervisors like Linux KVM.

In May, Intel announced a proof-of-concept of this feature built on top of rkt, as part of their Intel® Clear Containers effort to utilize hardware-embedded virtualization technology features to better secure container runtimes and isolate applications. We were excited to see this work taking place and being prototyped on top of rkt as it validated some of the early design choices we made, such as the concepts of runtime stages and pods. Here is what Arjan van de Ven from Intel's Open Source Technology Center had to say:

"Thanks to rkt's stage-based architecture, the Intel®Clear Containers team was able to rapidly integrate our work to bring the enhanced security of Intel® Virtualization Technology (Intel® VT-x) to the container ecosystem. We are excited to continue working with the rkt community to realize our vision of how we can enhance container security with hardware-embedded technology, while delivering the deployment benefits of containerized apps.”

Since the prototype announcement in May we have worked closely with the team from Intel to ensure that features such as one IP-per-pod networking and volumes work in a similar way when using virtualization. Today's release of rkt sees this functionality fully integrated to make the lkvm backend a first-class stage1 experience. So, let's try it out!

In this example, we will first run a pod using the default cgroups/namespace-based stage1. Let's launch the container with systemd-run, which will construct a unit file on the fly and start it. Checking the status of this unit will show us what’s going on under the hood.

$ sudo systemd-run --uid=0 \
   ./rkt run \
   --private-net --port=client:2379 \
   --volume data-dir,kind=host,source=/tmp/etcd \
   coreos.com/etcd,version=v2.2.0-alpha.0 \ 
   -- --advertise-client-urls="http://127.0.0.1:2379" \  
   --listen-client-urls="http://0.0.0.0:2379"
Running as unit run-1377.service.

$ systemctl status run-1377.service
● run-1377.service 
   CGroup: /system.slice/run-1377.service
           ├─1378 stage1/rootfs/usr/bin/systemd-nspawn
           ├─1425 /usr/lib/systemd/systemd 
           └─system.slice
             ├─etcd.service
             │ └─1430 /etcd
             └─systemd-journald.service
               └─1426 /usr/lib/systemd/systemd-journald

Notice that we can see the complete process hierarchy inside the pod, including a systemd instance and the etcd process.

Next, let's launch the same container under the new KVM-based stage1 by adding the --stage1-image flag:

$ sudo systemd-run -t --uid=0 \
  ./rkt run --stage1-image=sha512-c5b3b60ed4493fd77222afcb860543b9 \
  --private-net --port=client:2379 \
  --volume data-dir,kind=host,source=/tmp/etcd2 \
  coreos.com/etcd,version=v2.2.0-alpha.0 \
  -- --advertise-client-urls="http://127.0.0.1:2379" \
  --listen-client-urls="http://0.0.0.0:2379"
...

$ systemctl status run-1505.service
● run-1505.service
   CGroup: /system.slice/run-1505.service
           └─1506 ./stage1/rootfs/lkvm

Notice that the process hierarchy now ends at lkvm. This is because the entire pod is being executed inside a KVM process, including the systemd process and the etcd process: to the host system, it simply looks like a single virtual machine process. By adding a single flag to our container invocation, we have taken advantage of the same KVM technologies used by public clouds to isolate tenants to isolate our application container from the host, adding another layer of security to the host.

Thank you to Piotr Skamruk, Paweł Pałucki, Dimitri John Ledkov, Arjan van de Ven from Intel for their support and contributions. For more details on this feature see the lkvm stage1 guide.

Seamless Integration With Host Level-Logging

On systemd hosts, the journal is the default log aggregation system. With the v0.8.0 release, rkt now automatically integrates with the host journal, if detected, to provide a systemd native log management experience. To explore the logs of a rkt pod, all you need to do is add a machine specifier like -M rkt-$UUID to a journalctl command on the host.

As a simple example, let's explore the logs of the etcd container we launched earlier. First we use machinectl to list the pods that rkt has registered with systemd:

$ machinectl list
MACHINE                                  CLASS     SERVICE
rkt-bccc16ea-3e63-4a1f-80aa-4358777ce473 container nspawn
rkt-c3a7fabc-9eb8-4e06-be1d-21d57cdaf682 container nspawn

2 machines listed.

We can see our etcd pod listed as the second machine known by systemd. Now we use the journal to directly access the logs of the pod:

$ sudo journalctl -M rkt-c3a7fabc-9eb8-4e06-be1d-21d57cdaf682
etcd[4]: 2015-08-18 07:04:24.362297 N | etcdserver: set the initial cluster version to 2.2.0

User Namespace Support

This release includes initial support for user namespaces to improve container isolation. By leveraging user namespaces, an application may run as the root user inside of the container but will be mapped to a non-root user outside of the container. This adds an extra layer of security by isolating containers from the real root user on the host. This early preview of the feature is experimental and uses privileged user namespaces, but future versions of rkt will improve on the foundation found in this release and offer more granular control.

To turn user namespaces on, two flags need to be added to our original example: --private-users and --no-overlay. The first turns on the user namespace feature and the second disables rkt's overlayfs subsystem, as it is not currently compatible with user namespaces:

$ ./rkt run --no-overlay --private-users \
  --private-net --port=client:2379 \
  --volume data-dir,kind=host,source=/tmp/etcd \
  coreos.com/etcd,version=v2.2.0-alpha.0 \
  -- --advertise-client-urls="http://127.0.0.1:2379" \
     --listen-client-urls="http://0.0.0.0:2379"

We can confirm this is working by using curl to verify etcd's functionality and then checking the permissions on the etcd data directory, noting that from the host's perspective the etcd member directory is owned by a very high user id:

$ curl 172.16.28.19:2379/version
{"etcdserver":"2.2.0-alpha.0","etcdcluster":"2.2.0"}

$ ls -la /tmp/etcd
total 0
drwxrwxrwx  3 core       core        60 Aug 18 07:31 .
drwxrwxrwt 10 root       root       200 Aug 18 07:31 ..
drwx------  4 1037893632 1037893632  80 Aug 18 07:31 member

Adding user namespaces support is an important step towards our goal of making rkt the most secure container runtime, and we will be working hard to improve this feature in coming releases - you can see the roadmap in this issue.

Open Containers Initiative Progress

With rkt v0.8.0 we are furthering our efforts with security hardening and moving closer to a 1.0 stable and production-ready release. We are also dedicated to ensuring that the container ecosystem continues down a path that enables people publishing containers to “build once, sign once, and run anywhere.” Today rkt is an implementation of the App Container spec (appc), and in the future we hope to make rkt an implementation of the Open Container Initiative (OCI) specification. However, the OCI effort is still in its infancy and there is a lot of work left to do. To check on the progress of the effort to harmonize OCI and appc, you can read more about it on the OCI dev mailing list.

Contribute to rkt

One of the goals of rkt is to make it the most secure container runtime, and there is a lot of exciting work to be done as we move closer to 1.0. Join us on our mission: we welcome your involvement in the development of rkt, via discussion on the rkt-dev mailing list, filing GitHub issues, or contributing directly to the project.

BTRFS Training

Some years ago Barwon South Water gave LUV 3 old 1RU Sun servers for any use related to free software. We gave one of those servers to the Canberra makerlab and another is used as the server for the LUV mailing lists and web site and the 3rd server was put aside for training. The servers have hot-swap 15,000rpm SAS disks – IE disks that have a replacement cost greater than the budget we have for hardware. As we were given a spare 70G disk (and a 140G disk can replace a 70G disk) the LUV server has 2*70G disks and the 140G disks (which can’t be replaced) are in the server for training.

On Saturday I ran a BTRFS and ZFS training session for the LUV Beginners’ SIG. This was inspired by the amount of discussion of those filesystems on the mailing list and the amount of interest when we have lectures on those topics.

The training went well, the meeting was better attended than most Beginners’ SIG meetings and the people who attended it seemed to enjoy it. One thing that I will do better in future is clearly documenting commands that are expected to fail and documenting how to login to the system. The users all logged in to accounts on a Xen server and then ssh’d to root at their DomU. I think that it would have saved a bit of time if I had aliased commands like “btrfs” to “echo you must login to your virtual server first” or made the shell prompt at the Dom0 include instructions to login to the DomU.

Each user or group had a virtual machine. The server has 32G of RAM and I ran 14 virtual servers that each had 2G of RAM. In retrospect I should have configured fewer servers and asked people to work in groups, that would allow more RAM for each virtual server and also more RAM for the Dom0. The Dom0 was running a BTRFS RAID-1 filesystem and each virtual machine had a snapshot of the block devices from my master image for the training. Performance was quite good initially as the OS image was shared and fit into cache. But when many users were corrupting and scrubbing filesystems performance became very poor. The disks performed well (sustaining over 100 writes per second) but that’s not much when shared between 14 active users.

The ZFS part of the tutorial was based on RAID-Z (I didn’t use RAID-5/6 in BTRFS because it’s not ready to use and didn’t use RAID-1 in ZFS because most people want RAID-Z). Each user had 5*4G virtual disks (2 for the OS and 3 for BTRFS and ZFS testing). By the end of the training session there was about 76G of storage used in the filesystem (including the space used by the OS for the Dom0), so each user had something like 5G of unique data.

We are now considering what other training we can run on that server. I’m thinking of running training on DNS and email. Suggestions for other topics would be appreciated. For training that’s not disk intensive we could run many more than 14 virtual machines, 60 or more should be possible.

Below are the notes from the BTRFS part of the training, anyone could do this on their own if they substitute 2 empty partitions for /dev/xvdd and /dev/xvde. On a Debian/Jessie system all that you need to do to get ready for this is to install the btrfs-tools package. Note that this does have some risk if you make a typo. An advantage of doing this sort of thing in a virtual machine is that there’s no possibility of breaking things that matter.

  1. Making the filesystem
    1. Make the filesystem, this makes a filesystem that spans 2 devices (note you must use the-f option if there was already a filesystem on those devices):

      mkfs.btrfs /dev/xvdd /dev/xvde
    2. Use file(1) to see basic data from the superblocks:

      file -s /dev/xvdd /dev/xvde
    3. Mount the filesystem (can mount either block device, the kernel knows they belong together):

      mount /dev/xvdd /mnt/tmp
    4. See a BTRFS df of the filesystem, shows what type of RAID is used:

      btrfs filesystem df /mnt/tmp
    5. See more information about FS device use:

      btrfs filesystem show /mnt/tmp
    6. Balance the filesystem to change it to RAID-1 and verify the change, note that some parts of the filesystem were single and RAID-0 before this change):

      btrfs balance start -dconvert=raid1 -mconvert=raid1 -sconvert=raid1 –force /mnt/tmp

      btrfs filesystem df /mnt/tmp
    7. See if there are any errors, shouldn’t be any (yet):

      btrfs device stats /mnt/tmp
    8. Copy some files to the filesystem:

      cp -r /usr /mnt/tmp
    9. Check the filesystem for basic consistency (only checks checksums):

      btrfs scrub start -B -d /mnt/tmp
  2. Online corruption
    1. Corrupt the filesystem:

      dd if=/dev/zero of=/dev/xvdd bs=1024k count=2000 seek=50
    2. Scrub again, should give a warning about errors:

      btrfs scrub start -B /mnt/tmp
    3. Check error count:

      btrfs device stats /mnt/tmp
    4. Corrupt it again:

      dd if=/dev/zero of=/dev/xvdd bs=1024k count=2000 seek=50
    5. Unmount it:

      umount /mnt/tmp
    6. In another terminal follow the kernel log:

      tail -f /var/log/kern.log
    7. Mount it again and observe it correcting errors on mount:

      mount /dev/xvdd /mnt/tmp
    8. Run a diff, observe kernel error messages and observe that diff reports no file differences:

      diff -ru /usr /mnt/tmp/usr/
    9. Run another scrub, this will probably correct some errors which weren’t discovered by diff:

      btrfs scrub start -B -d /mnt/tmp
  3. Offline corruption
    1. Umount the filesystem, corrupt the start, then try mounting it again which will fail because the superblocks were wiped:

      umount /mnt/tmp

      dd if=/dev/zero of=/dev/xvdd bs=1024k count=200

      mount /dev/xvdd /mnt/tmp

      mount /dev/xvde /mnt/tmp
    2. Note that the filesystem was not mountable due to a lack of a superblock. It might be possible to recover from this but that’s more advanced so we will restore the RAID.

      Mount the filesystem in a degraded RAID mode, this allows full operation.

      mount /dev/xvde /mnt/tmp -o degraded
    3. Add /dev/xvdd back to the RAID:

      btrfs device add /dev/xvdd /mnt/tmp
    4. Show the filesystem devices, observe that xvdd is listed twice, the missing device and the one that was just added:

      btrfs filesystem show /mnt/tmp
    5. Remove the missing device and observe the change:

      btrfs device delete missing /mnt/tmp

      btrfs filesystem show /mnt/tmp
    6. Balance the filesystem, not sure this is necessary but it’s good practice to do it when in doubt:

      btrfs balance start /mnt/tmp
    7. Umount and mount it, note that the degraded option is not needed:

      umount /mnt/tmp

      mount /dev/xvdd /mnt/tmp
  4. Experiment
    1. Experiment with the “btrfs subvolume create” and “btrfs subvolume delete” commands (which act like mkdir and rmdir).
    2. Experiment with “btrfs subvolume snapshot SOURCE DEST” and “btrfs subvolume snapshot -r SOURCE DEST” for creating regular and read-only snapshots of other subvolumes (including the root).

August 15, 2015

August 14, 2015

Introducing the Kubernetes kubelet in CoreOS Linux

This week we have added the kubelet, a central building block of Kubernetes, in the alpha channel for CoreOS Linux. The kubelet is responsible for maintaining a set of pods, which are composed of one or more containers, on a local system. Within a Kubernetes cluster, the kubelet functions as a local agent that watches for pod specs via the Kubernetes API server. The kubelet is also responsible for registering a node with a Kubernetes cluster, sending events and pod status, and reporting resource utilization.

While the kubelet plays an important role in a Kubernetes cluster, it also works well in standalone mode — outside of a Kubernetes cluster. The rest of this post will highlight some of the useful things you can do with the kubelet running in standalone mode such as running a single node Kubernetes cluster and monitoring container resource utilization with the built-in support for cAdvisor.

First we need to get the kubelet up and running. Be sure to follow this tutorial using CoreOS Linux 773.1.0 or greater.

Configuring the Kubelet with systemd

CoreOS Linux ships with reasonable defaults for the kubelet, which have been optimized for security and ease of use. However, we are going to loosen the security restrictions in order to enable support for privileged containers. This is required to run the proxy component in a single node Kubernetes cluster, which needs access to manipulate iptables to facilitate the Kubernetes service discovery model.

Create the kubelet systemd unit:

sudo vim /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStartPre=/usr/bin/mkdir -p /etc/kubernetes/manifests
ExecStart=/usr/bin/kubelet \
  --api-servers=http://127.0.0.1:8080 \
  --allow-privileged=true \
  --config=/etc/kubernetes/manifests \
  --v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Start the kubelet service

With the systemd unit file in place start the kubelet using the systemctl command:

sudo systemctl daemon-reload
sudo systemctl start kubelet

To ensure the kubelet restarts after a reboot be sure to enable the service:

sudo systemctl enable kubelet

At this point you should have a running kubelet service. You can verify this using the systemctl status command:

sudo systemctl status kubelet

Bootstrapping a single node Kubernetes cluster

The kubelet provides a convenient interface for managing containers on a local system. The kubelet supports a manifest directory, which is monitored for pod manifest every 20 seconds by default. This directory /etc/kubernetes/manifests was configured earlier via the --config flag in the kubelet systemd unit.

Pod manifests are written in the JSON or YAML file formats and describe a set of volumes and one or more containers. We can deploy a single node Kubernetes cluster using a pod manifest placed in the manifest directory.

Download the Kubernetes pod manifest

wget https://raw.githubusercontent.com/coreos/pods/master/kubernetes.yaml

Downloading a pod manifest over the Internet is a potential security risk, so be sure to review the contents of any pod manifest before running them on your system.

cat kubernetes.yaml

At this point we only need to copy the kubernetes.yaml pod manifest to the kubelet’s manifest directory in order to bootstrap a single node cluster.

sudo cp kubernetes.yaml /etc/kubernetes/manifests/

After the copy completes you can view the Docker images and containers being started with the standard Docker command line tools:

sudo docker images
sudo docker ps

After a few minutes you should have a running Kubernetes cluster. Next download the official Kubernetes client tool.

Download the Kubernetes client

kubectl is the official command line tool for interacting with a Kubernetes cluster. Each release of Kubernetes contains a new kublet version. Download it and make it executable:

wget https://storage.googleapis.com/kubernetes-release/release/v1.0.3/bin/linux/amd64/kubectl
chmod +x kubectl

kubectl can be used to get information about a running cluster:

./kubectl cluster-info
Kubernetes master is running at http://localhost:8080

kubectl can also be used to launch pods:

./kubectl run nginx --image=nginx

View the running pods using the get pods command:

./kubectl get pods

To learn more about Kubernetes check out the Kubernetes on CoreOS docs.

Monitoring Containers with cAdvisor

The kubelet ships with built-in support for cAdvisor, which collects, aggregates, processes and exports information about running containers on a given system. cAdvisor includes a built-in web interface available on port 4194.

cadvisor

The cadvisor web interface.

The cAdvisor web UI provides a convenient way to view system wide resource utilization and process listings.

cadvisor gauges

System utilization information.

cAdvisor can also be used to monitor a specific container such as the kube-apiserver running in the Kubernetes pod:

cadvisor inspecting a container

Inspecting a container with cadvisor.

To learn more about cAdvisor check out the upstream docs.

More with CoreOS and Kubernetes

Adding the kubelet to the CoreOS Linux image demonstrates our commitment to Kubernetes and bringing the best of open source container technology to our users. With native support for the Kubernetes kubelet we hope to streamline Kubernetes deployments, and provide a robust interface for managing and monitoring containers on a CoreOS system.

If you’re interested in learning more about Kubernetes, be sure to attend one of our upcoming trainings on Kubernetes in your area. More dates will be added so keep checking back. If you want to request private on-site training, contact us.

August 12, 2015

Downgrade Quagga on Debian 8

The Quagga version in Debian 8 (v0.99.23.1) suffers from a bug in ospf6d, which causes that no IPv6 routes are exchanged via point-to-point interfaces.

In order to workaround this problem (and re-establish IPv6 connectivity), a downgrade of the quagga package can be done.

For this we add the 'oldstable' entry to sources.list and pin the quagga package to the old version.

Entry to add to /etc/apt/sources.list:

deb http://mirror.switch.ch/ftp/mirror/debian/ oldstable main

Entry to add to /etc/apt/preferences:

Package: quagga
Pin: version 0.99.22.*
Pin-Priority: 1001

After the entries have been added, run apt-get update followed by apt-get install quagga to downgrade to the old quagga package.

August 04, 2015

Meet the CoreOS team around the world in August

This month the CoreOS team will be speaking from locations along the Pacific Northwest in the US, to Austria, to Japan and China. August also begins our Kubernetes workshop series, brought to you by Google and Intel.


Wednesday, August 5, 2015 at 10:00 a.m. PST – Portland, OR

We kick off our Kubernetes training series in Portland with Kelsey Hightower (@kelseyhightower), product manager, developer and chief advocate at CoreOS. This hands-on workshop will teach you everything you need to know about Kubernetes, CoreOS and Docker. We are offering the workshop for only the cost of materials ($75) for a limited time so we encourage you to send any members of your team for this date. Register in advance to attend.


Friday, August 7, 2015 at 10:00 a.m. PST – Seattle, WA

Kelsey will provide the next Kubernetes training in Seattle, guiding your team through Kubernetes, CoreOS and Docker for only the cost of materials ($75) for a limited time. This event is sold out but we have several other trainings in other cities.


Friday, August 7, 2015 at 4:00 p.m. PST – Las Vegas, NV

Going to DEF CON 23 this year? Meet Brian “Redbeard” Harrington (@brianredbeard), who will speak in a Skytalk on container security and kernel namespaces on Friday, August 7.

In case you missed it, see Redbeard’s presentation on minimal containers from the CoreOS + Sysdig San Francisco July meetup.


Monday, August 10, 2015 at 10:00 a.m. PST – San Francisco, CA

Join us for a daylong Kubernetes training in San Francisco. Kelsey will walk you through Kubernetes, CoreOS and Docker. Seats are filling up quickly so register early to secure your spot.


Tuesday, August 11, 2015 at 6:30 p.m. BST – London, UK

Join Iago López (@iaguis), senior developer, for the Container Infrastructure Meetup at uSwitch in London. He’ll provide an overview and update on rkt, a container runtime designed to be composable, secure and fast.


Monday, August 17, 2015 at 2:20 p.m. PST – Seattle, WA

CoreOS will be at LinuxCon and ContainerCon for the week! Join us for a variety of talks in Seattle.


Wednesday, August 19, 2015 at 7:00 p.m. JST – Tokyo, Japan

Save the date! Kelsey Hightower will be speaking at a meetup in Tokyo. More details will be added – stay tuned on our community page for updates.


Wednesday, August 19, 2015 at 10:25 a.m. PST – Seattle, WA

More CoreOS talks at LinuxCon and ContainerCon include two speakers with expertise in security and networking.


Thursday, August 20, 2015 at 9:30 a.m. PST – Seattle, WA

From LinuxCon and ContainerCon, our team will also be speaking at Linux Plumber’s Conference in Seattle. Brandon Philips will kick off the event with a talk on Open Containers.


Friday, August 21, 2015 at 11:10 a.m. JST – Tokyo, Japan

Meet Kelsey Hightower in Tokyo at YAPC Asia. He’ll discuss managing containers at scale with CoreOS and Kubernetes.


Friday, August 21, 2015 at 9:00 a.m. PST – Seattle, WA

Linux Plumbers Conference attendees are welcome to join Matthew Garrett to learn about securing the entire boot chain.

MesosCon attendees should not miss Brandon Philips discussing rkt and more at 11:30 a.m. PT.


Tuesday, August 25, 2015 – Vienna, Austria

Jonathan Boulle will be giving a keynote at Virtualization in High-Performance Cloud Computing (VHPC ’15), held in conjunction with Euro-Par 2015, in Vienna, Austria. Jon will discuss the work behind designing an open standard for running applications in containers.


Wednesday, August 26, 2015 at 11:35 a.m. PST – Mountain View, CA

OpenStack Silicon Valley, hosted at the Computer History Museum, will feature Alex Polvi (@polvi), CEO of CoreOS. He’ll present Containers for the Enterprise: It's Not That Simple on August 26 at 11:35 a.m. PT.

Immediately following is a deep-dive session with Wall Street Journal technology reporter Shira Ovide (@ShiraOvide), joined by Alex, James Staten, chief strategist of the cloud and enterprise division at Microsoft, as well as Craig McLuckie (@cmcluck), senior product manager at Google. They will discuss practical choices facing enterprises moving to an IT resource equipped to support software developers in their work to help their companies compete.


Friday, August 28, 2015 – Beijing, China

At CNUT Con, presented by InfoQ in Beijing, Kelsey Hightower will give a keynote: From Theory to Production: Managing Applications at Scale.


To invite CoreOS to a meetup, training or conference in your area email us or tweet to us @CoreOSLinux!

July 24, 2015

Introducing etcd 2.1

After months of focused work, etcd 2.1 has been released. Since the etcd 2.0 release in January, the team has gathered a ton of valuable feedback from real-world environments. And based on that feedback, this release introduces: authentication/authorization APIs, new metric endpoints, improved transportation stability, increased performance between etcd servers, and enhanced cluster stability.

For a quick overview, etcd is an open source, distributed, consistent key value store for shared configuration, service discovery, and scheduler coordination. By using etcd, applications can ensure that even in the face of individual servers failing, the application will continue to work. etcd is a core component of CoreOS software that facilitates safe automatic updates, coordinating work being scheduled to hosts, and setting up overlay networking for containers.

If you want to skip the talk and get right to the code, you can find new binaries on GitHub. etcd 2.1.1 is available in CoreOS 752.1.0 (currently in the alpha channel), so feel free to take it for a spin.

Zero-Downtime Rolling Upgrade from 2.0

Upgrading from etcd 2.0 to etcd 2.1 is a zero-downtime rolling upgrade. The basic approach is that you can upgrade a cluster running etcd 2.0 one-by one to etcd 2.1. For more details, please read the upgrade documentation. If you are running your cluster under etcd 0.4.x, please upgrade to etcd 2.0 first and then follow the rolling upgrade steps.

Also, with this release, etcd 2.1 is now the current stable etcd release; as such, all bug fixes will go into new etcd 2.1.x releases and won't be backported to etcd 2.0.x.

Auth API for Authentication and Authorization

A major feature in this release is the /v2/auth endpoint, which adds auth to the etcd key/value API. This API lets you manage authorization of key prefixes with users and roles and authenticate those users using HTTP basic authentication, enabling users to have more control within teams. This includes support in the etcd HTTP server, the command-line etcdctl client, and the Go etcd/client package. You can find full details in the authentication documentation. Please note that this is an experimental feature and will be improved based on user feedback. We think we got the details right but may adjust the API in a subsequent release.

Improved Transport Stability

Many users of etcd have networks with inconsistent performance and latency. We can't make etcd work perfectly in all of these difficult environments but what we have done in this release is optimize the way etcd uses the network in a variety of ways to make it perform in an optimal manner.

First, to reduce the connection creation overhead and to make the consensus protocol (raft) communication more efficient and stable, etcd now maintains long running connections with other peers. Next, to reduce the raft command commit latency, each raft append message is now attached to a commit index. The commit latency is reduced from 100ms to 1ms under light load (<100 writes/second). And finally, etcd's raft implementation now provides better internal flow control, significantly reducing the possibility of raft message loss, and improving CPU and memory efficiency.

Functional Testing

For four months we have been running etcd against a fault-injecting and functional testing framework we built. Our goal is to ensure etcd is failure-resistant while under heavy usage; and in these months of testing, etcd has shown to be robust under many kinds of harsh failure scenarios. We will continue to run these tests as we iterate on the 2.1 releases.

Improved Logging

Leveled logging is supported now. Users can set an expected log level for etcd and its subpackages. In the meantime, we have moved verbose, repeated logging to DEBUG log level, so etcd's default log will be significantly more readable. You can control leveled logging using flags listed here.

New Metrics API

etcd 2.1 introduces a new metrics API endpoint that can be used for real-time monitoring and debugging. It exposes statistics about both client behaviors and resource usage. Like the auth API endpoint, this is an experimental feature which may be improved and changed based on user feedback.

Get Involved and Get Started

We will continue to work to make etcd a fundamental building block for Google-like infrastructure that users can take off the shelf, build upon, and rely on. Get started with etcd, continue to share your feedback, and even help by contributing directly to the code.

July 21, 2015

CoreOS and Kubernetes 1.0

Today is a big day for Kubernetes, as it hits its 1.0 release milestone. Kubernetes provides a solid foundation for running container-based infrastructure providing API driven deployment, service discovery, monitoring and load balancing. It is exciting progress towards bringing industry-wide Google-like infrastructure for everyone else (GIFEE) through community-built open source software.

Kubernetes 1.0 on CoreOS Open Source Guides

The Kubernetes project has come a long way in just over a year, with many API improvements and more recently a focus on stability and scalability. If you haven't tried Kubernetes recently it is a worthwhile experience and can get you thinking how containers can be more easily used in real-world deployments: whether it is doing your first rolling upgrade of your containerized app or using DNS service discovery between components.

For those that want to try Kubernetes 1.0 on CoreOS, we have put together some easy-to-read open source guides to run Kubernetes 1.0 on CoreOS. And as always if you need help try us on the #coreos irc channel or coreos-user mailing list.

CoreOS Joins Cloud Native Foundation

When we started building CoreOS Linux two years ago we wanted to encourage people to run infrastructure in a secure, distributed and consistent manner. This required many components along the way, including new datastores like etcd, container runtimes like Docker & rkt, and cluster wide application deployment, orchestration, and service discovery like Kubernetes. Today, CoreOS is joining a new foundation along with Google, Twitter, Huawei and other industry partners to collaborate and build the technologies that are changing how people are thinking about infrastructure software. This new foundation, the Cloud Native Foundation, is being launched in partnership with the Linux Foundation and will shepherd the industry collaboration around Kubernetes and other projects moving forward.

Tectonic Preview

For companies who want help building their infrastructure in this this manner we are also announcing that Tectonic is now in Preview, this includes: 24x7 support, a friendly web-based console, and deployment guides for AWS and on your own hardware. We invite you to read more about Tectonic Preview on our Tectonic blog.

Kubernetes Training

Also today, we are launching Kubernetes Training. The first workshops will be delivered by Kelsey Hightower, product manager, developer and chief advocate at CoreOS, and will take place on August 5 in Portland, August 7 in Seattle and August 10 in San Francisco.

By joining these workshops, you will learn more about Kubernetes, CoreOS, Docker and rkt and leave knowing Kubernetes Core Concepts, how to enable and manage key cluster add-ons such as DNS, monitoring, and the UI, how to configure nodes for the Kubernetes networking model and how to manage applications with Kubernetes deployment patterns.

For a limited time, the workshops will be available at a special rate for only the cost materials. Sign-up for a workshop in your area early they will fill-up fast.

CoreOS at OSCON

The CoreOS team is at OSCON this week and you have three ways to find us:

CoreOS and Kubernetes 1.0

Today is a big day for Kubernetes, as it hits its 1.0 release milestone. Kubernetes provides a solid foundation for running container-based infrastructure providing API driven deployment, service discovery, monitoring and load balancing. It is exciting progress towards bringing industry-wide Google-like infrastructure for everyone else (GIFEE) through community-built open source software.

Kubernetes 1.0 on CoreOS Open Source Guides

The Kubernetes project has come a long way in just over a year, with many API improvements and more recently a focus on stability and scalability. If you haven't tried Kubernetes recently it is a worthwhile experience and can get you thinking how containers can be more easily used in real-world deployments: whether it is doing your first rolling upgrade of your containerized app or using DNS service discovery between components.

For those that want to try Kubernetes 1.0 on CoreOS, we have put together some easy-to-read open source guides to run Kubernetes 1.0 on CoreOS. And as always if you need help try up on the #coreos irc channel or coreos-user mailing list.

CoreOS Joins Cloud Native Foundation

When we started building CoreOS Linux two years ago we wanted to encourage people to run infrastructure in a secure, distributed and consistent manner. This required many components along the way, including new datastores like etcd, container runtimes like Docker & rkt, and cluster wide application deployment, orchestration, and service discovery like Kubernetes. Today, CoreOS is joining a new foundation along with Google, Twitter, Huawei and other industry partners to collaborate and build the technologies that are changing how people are thinking about infrastructure software. This new foundation, the Cloud Native Foundation, is being launched in partnership with the Linux Foundation and will shepherd the industry collaboration around Kubernetes and other projects moving forward.

Tectonic Preview

For companies who want help building their infrastructure in this this manner we are also announcing that Tectonic is now in Preview, this includes: 24x7 support, a friendly web-based console, and deployment guides for AWS and on your own hardware. We invite you to read more about Tectonic Preview on our Tectonic blog.

Kubernetes Training

Also today, we are launching Kubernetes Training. The first workshops will be delivered by Kelsey Hightower, product manager, developer and chief advocate at CoreOS, and will take place on August 5 in Portland, August 7 in Seattle and August 10 in San Francisco.

By joining these workshops, you will learn more about Kubernetes, CoreOS, Docker and rkt and leave knowing Kubernetes Core Concepts, how to enable and manage key cluster add-ons such as DNS, monitoring, and the UI, how to configure nodes for the Kubernetes networking model and how to manage applications with Kubernetes deployment patterns.

For a limited time, the workshops will be available at a special rate for only the cost materials. Sign-up for a workshop in your area early they will fill-up fast.

CoreOS at OSCON

The CoreOS team is at OSCON this week and you have three ways to find us:

July 20, 2015

July 17, 2015

Meet CoreOS at OSCON and more

Next week we are heading to Portland, Oregon for OSCON. We look forward to meeting fellow OSCON attendees and Portland friends at one of the below events, or at our booth (# 900) on the OSCON show floor, July 21-24. If you have questions about Kubernetes, CoreOS, Docker or rkt sign up for office hours at our booth and get one-on-one time with our team. Read on to see more about where we will be next week. See you then!

Sunday, July 19

Get revved up for OSCON and see Kelsey Hightower, product manager, developer and chief advocate at CoreOS, speak in a lightning talk at the NGINX Summit at 3 p.m. PT. Register here for your ticket.

Tuesday, July 21

CoreOS will be at the Kubernetes 1.0 event – be sure to get there in time for the keynote at 11 a.m. PT. Get your ticket before it sells out! If you can’t make it in person you can register for the live-stream. We’ll be there throughout the day and if you miss us at the event, connect with our team at the Kubernetes After Hours Party on Tuesday too.

At OSCON, Kelsey Hightower will deliver a much-requested 3.5-hour tutorial starting at 1:30 p.m. PT on taming microservices with CoreOS and Kubernetes.

The OSCON Expo Hours begin at 5 p.m. so meet us at our booth if you’re there early for the reception.

Wednesday, July 22 - Thursday, July 24

Our CoreOS booth will have expert engineers to answer your questions and get you started with Tectonic. Sign up for office hours and talk with a CoreOS expert to get all your Kubernetes, CoreOS, Docker and rkt questions answered. Visit us at booth 900 all day on Wednesday and Thursday and tweet to us @CoreOSLinux or @TectonicStack.

Wednesday, July 22

Join us for our second annual CoreOS Portland OSCON meetup starting at 6 p.m. PT at the Ecotrust Building. Brian “Redbeard” Harrington, principal architect at CoreOS, Brandon Philips, CTO of CoreOS, Kelsey Hightower, product manager at CoreOS, and Matthew Garrett, principal security engineer at CoreOS, will lead the talks of the evening. We thank our sponsors, Redapt and Couchbase for making the event possible and providing drinks and bites on the Ecotrust rooftop! RSVP here.

Thursday, July 23

After your day at OSCON, join us for a Birds of a Feather (BoF) session at 7 p.m. PT by Brian “Redbeard” Harrington, principal architect at CoreOS. He will have a lively interactive conversation with attendees and cover how to get started with CoreOS, CoreOS components and CoreOS best practices you want to learn about most.

Friday, July 24

At 11:10 a.m. PT Matthew Garrett will present building a trustworthy computer.

See you in Portland!

July 15, 2015

Announcing rkt v0.7.0, featuring a new build system, SELinux and more

Today we are announcing rkt v0.7.0. rkt is an app container runtime built to be efficient, secure and composable for production environments. This release includes new subcommands for a rkt image to manipulate images from the local store, a new build system based on autotools and integration with SELinux. These new capabilities improve the user experience, make it easier to build future features and improve security isolation between containers.

Note on rkt and OCP

As you know, rkt is an implementation of the App Container (appc) spec and rkt is also targeted to be a future implementation of the Open Container Project (OCP) specification. The OCP development is still in its early days. Our plans with rkt are unchanged and the team is committed to the continued development of rkt. This is all a part of the goal to build rkt as a leading container runtime focused on security and composability for the most demanding production requirements.

Now, read on for details on the new features.

New Subcommands for rkt image

In this release all of the subcommands dealing with images in the local store can be found inside rkt image. Apart from the already existing subcommands rkt image list, rkt image rm and rkt image cat-manifest, this release adds three more:

rkt image export

This subcommand exports an ACI from the local store. This comes in handy when you want to copy an image to another machine, file server and so-on.

$ rkt image export coreos.com/etcd etcd.aci
$ tar xvf etcd.aci

Note that this command does not perform any network I/O so the image must be in the local store beforehand. Also, the exported ACI file might be different from the original imported to the store because rkt image export always returns uncompressed ACIs.

rkt image extract

For debugging or inspection you may want to extract an ACI to a directory on disk. You can get the full ACI or just its rootfs:

$ rkt image extract coreos.com/etcd etcd-extracted
$ find etcd-extracted
etcd-extracted
etcd-extracted/manifest
etcd-extracted/rootfs
etcd-extracted/rootfs/etcd
etcd-extracted/rootfs/etcdctl
...
$ rkt image extract --rootfs-only coreos.com/etcd etcd-rootfs
$ find etcd-rootfs
etcd-extracted
etcd-extracted/etcd
etcd-extracted/etcdctl
...

As with rkt image export no network I/O will be performed.

rkt image render

While the previous command extracts an ACI to a directory, it doesn’t take into account image dependencies or pathWhitelists. To get an image rendered as it would look ready-to-run inside of the rkt stage2 you can run rkt image render:

$ rkt image render --rootfs-only coreos.com/etcd etcd-rendered
$ find etcd-rendered
etcd-extracted
etcd-extracted/etcd
etcd-extracted/etcdctl
...

New Build System

In 0.7.0 we introduce a new build system based on autotools. Previous versions of rkt were built with a combination of shell scripts and ad-hoc Makefiles. As building complexity grew, more and more environment variables were added that made new build options less discoverable and complicated development.

The new build system based on autotools in 0.7.0 has more discoverable options and should make it easier to build future features like cross-compiling or a KVM-based stage1.

This is how you build rkt now:

$ ./autogen.sh 

----------------------------------------------------------------
Initialized build system. For a common configuration please run:
----------------------------------------------------------------

./configure --with-stage1=coreos
$ ./configure --help
`configure' configures rkt 0.7.0+git to adapt to many kinds of systems.
[...]
Optional Features:                                                                                                                                                                                                                                                             
  --disable-option-checking  ignore unrecognized --enable/--with options                                                                                                                                                                                                       
  --disable-FEATURE       do not include FEATURE (same as --enable-FEATURE=no)                                                                                                                                                                                                 
  --enable-FEATURE[=ARG]  include FEATURE [ARG=yes]                                                                                                                                                                                                                            
  --enable-functional-tests                                                                                                                                                                                                                                                    
                          enable functional tests on make check (linux only,                                                                                                                                                                                                   
                          uses sudo, default: no, use auto to enable if                                                                                                                                                                                                        
                          possible)                                                                                                                                                                                                                                            

Optional Packages:                                                                                                                                                                                                                                                             
  --with-PACKAGE[=ARG]    use PACKAGE [ARG=yes]
  --without-PACKAGE       do not use PACKAGE (same as --with-PACKAGE=no)
  --with-stage1=type      type of stage1 build one of 'src', 'coreos', 'host',
                          'none', 'kvm' (default: 'coreos')
  --with-stage1-systemd-src=git-path
                          address to git repository of systemd, used in 'src'
                          build mode (default: 'https://github.com/systemd/systemd.git')
  --with-stage1-systemd-version=version
                          systemd version to build (default:
                          'v220')
  --with-stage1-image-path
                          custom stage1 image path (default:
                          '')
$ ./configure && make -j4
[...]

Note that all the build options are listed with a description text that helps the user know what to write instead of having them read the build scripts to figure out which environmental variables to set.

SELinux Support

We also added support for running containers using SELinux SVirt, improving security isolation between containers. This means every rkt instance will run in a different SELinux context. Processes started in these contexts will be unable to interact with processes or files in any other instance’s context, even though they are running as the same user.

This feature depends on appropriate policy being provided by the underlying Linux distribution. If supported, a file called “lxc_contexts” will be present in the SELinux contexts directory under /etc/selinux. In the absence of appropriate support, SELinux SVirt will automatically be disabled at runtime.

Other Features

  • rkt registers pods with the metadata service by default now. Ensure it is running before running pods (rkt metadata-service) or disable registration with rkt run --mds-register=false.
  • We started improving rkt UX by reducing stage1 verbosity and writing better and more consistent error messages. As we look towards the next rkt releases, we will be focusing more on UX improvements.

Get Involved

Be a part of the development of rkt or join the discussion through the rkt-dev mailing list or GitHub issues. We welcome you to contribute directly to the project.

July 14, 2015

Q&A with Sysdig on containers, monitoring and CoreOS

Today we congratulate Sysdig, the container visibility company, on its funding news and launch of its commercial offering, Sysdig Cloud. We interviewed Loris Degioanni, the creator and CEO of Sysdig, about the company, containers and how Sysdig works with CoreOS. He is a pioneer in the field of network analysis through his work on WinPcap and Wireshark, which are open source tools with millions of users worldwide.

Read on to dive in, and be sure to meet Sysdig and our team at our July 29 Meetup in San Francisco to learn more.

Q: In your own words, what is Sysdig? Why is it important in containerized environments?

Loris: Sysdig is an open source system visibility tool, designed to meet the needs of modern IT professionals. You can use it to monitor and troubleshoot things like system activity, network and file I/O, application requests and much more. Unique features include the ability to work with trace files (similar to tools such as Wireshark) and deep, native container support.

As for containerized environments: containers are an extremely interesting and powerful technology – I’m personally a big fan. But containers are also a relatively young technology (at least in their current form), and until now there has been a bit of catch 22 in terms of container visibility. Either you monitor your containers from the outside, with inherently limited visibility, given the opaque and self-contained nature of containers. Or you install extra monitoring software inside the container, which largely undermines the benefits of using a container in the first place – performance, deployability, portability, dependency simplification, security, etc.

Sysdig is the first visibility tool designed specifically to support containers. And in order to truly support containers, we knew we had to solve the issue above. Sysdig’s instrumentation is based on a little kernel module that can capture information like system calls from “underneath” containers. This makes it possible to explore anything that’s happening inside containers, while running sysdig entirely on the host machine or inside another container. There is no need to instrument your containers, or install any agent inside them. In other words, Sysdig provides full visibility into all your containers, from outside of the containers themselves.

This tends to be quite a radical departure from what people are used to, and is also the basis of our commercial product, Sysdig Cloud. Based on this same open source technology, Sysdig Cloud offers a container-native monitoring solution, with distributed collection, discovery, dashboarding, alerting, and topology mapping.

Q: What lessons from contributing to Wireshark influence what you are doing today?

Loris: I spent my Ph.D. and the first 10 years of my career working on network monitoring. The lessons I learned during that time have highly influenced the architecture and underlying philosophies of sysdig.

Network monitoring as a whole offers a pretty elegant set of workflows. First, there is the fundamental ability to capture the information you need into trace files. These trace files are not only easily shared, but maybe even more importantly, they decouple the troubleshooting process from the issue itself. No longer are you working inside of a broken system, trying to fix a problem, as the problem is bringing down the system around you. Network monitoring workflows also include the ability to filter information with well known languages, and visualize your data with industry standard tools like Wireshark.

I believe these workflows are not only relevant in the context of network monitoring. Trace files, decoupled troubleshooting, natural filters, standardized visualizations: these are widely applicable concepts. With our work on sysdig, we are trying to bring these well-proven approaches from the world of network monitoring into the world of system, container and application visibility.

Q: How does Sysdig work with CoreOS environments? What types of information can Sysdig pull from a CoreOS host?

Loris: Sysdig fully supports CoreOS environments, and offers the same 100% visibility you would find in a non-containerized environment. Sysdig works with CoreOS by installing the container we provide, which contains all the required dependencies and offers an isolated execution environment. Since we provide a precompile driver, installation is really easy – it is a single command line and takes 30 seconds.

Once installed, sysdig will be able to surface very rich information about your CoreOS environment: both the host OS and the containers you have running. This includes everything from top processes, to network connections, to top files and directories, to a list of executed commands for both the host OS and any of the running containers. And that’s just the tip of the iceberg. For some interesting use cases with sysdig running in CoreOS environments, you can refer to our two-part CoreOS blog series here and here.

Q: What is the memory and CPU overhead required by Sysdig?

Loris: Typically low, but it depends on what kind of activity is happening on the machine. Sysdig instruments the operating system’s kernel, and the overhead depends on how many events there are to be captured. On a machine with average load, the CPU occupation should be very low: a few percentage points. CPU occupation of sysdig can go higher on systems with a lot of I/O or network activity. The Sysdig Cloud agent, on the other hand, incorporates additional protective mechanisms, such as subsampling techniques, to ensure the CPU occupation always stays within an acceptable range of <5%.

Q: You presented the Dark Art of Container Monitoring at CoreOS Fest this year. Tell us more about what should be monitored.

Loris: In terms of what should be monitored, my answer is: everything! The really important question is: how should it be monitored? The same features that make containers so interesting and revolutionary (i.e. the fact that they are isolated, self-contained, simple and lightweight), make them a real challenge to monitor. In particular, the traditional approach of having an agent on any “entity” doesn’t work well with containers, because it’s too invasive and doesn’t scale.

This is the problem we’re trying to solve with sysdig and Sysdig Cloud. We’re excited about working on it because great visibility is a key requirement to adopt containers in production.

Q: Describe what Sysdig does with CoreOS Linux to help monitor system security.

Loris: Sysdig has powerful security-oriented features. Here are some examples of what CoreOS users can do with sysdig to monitor system security:

  • Show the directories that the user "root" visits
  • Observe ssh activity
  • Show every file open that happens in /etc
  • Show all the network connections established by a process

Now think about being able to obtain this information for any container running on a CoreOS host, but from outside the container, with no instrumentation and no dependencies.

If you are curious to try sysdig out, installation on CoreOS is super easy and instructions can be found here. And don’t forget to let us know what you think on twitter or at info@sysdig.com!

Join CoreOS and Sysdig in San Francisco for the July Meetup

Attend this month’s CoreOS San Francisco Meetup that will feature the CoreOS team and Gianluca Borello, senior software engineer at Sysdig.

When: Wednesday, July 29, 2015 starting at 6 p.m. PT

Where: Okta, 301 Brannan Street, San Francisco, CA 94107

RSVP: http://www.meetup.com/coreos/events/223897172/

July 12, 2015

Scapy and IP Options

Create packets with custom IPv4 IP Option fields using Scapy:

>>> packet=IP(src="203.0.113.1",dst="203.0.113.2",options=[IPOption('%s%s'%('\x86\x28','a'*38))])
>>> ls(packet)
version    : BitField             = 4               (4)
ihl        : BitField             = None            (None)
tos        : XByteField           = 0               (0)
len        : ShortField           = None            (None)
id         : ShortField           = 1               (1)
flags      : FlagsField           = 0               (0)
frag       : BitField             = 0               (0)
ttl        : ByteField            = 64              (64)
proto      : ByteEnumField        = 0               (0)
chksum     : XShortField          = None            (None)
src        : Emph                 = '203.0.113.1'   (None)
dst        : Emph                 = '203.0.113.2'   ('127.0.0.1')
options    : PacketListField      = [<IPOption  copy_flag=1L optclass=control option=commercial_security length=40 value='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' |>] ([])
>>> sr1(packet)

The above code results in the following packet (as seen by Wireshark):

Wireshark showing the packet with the custom IP Option

July 11, 2015

Upgrade to Debian 8 without systemd

To avoid the automatic installation/switch to systemd during the upgrade to Debian 8, it is enough to prevent the installation of the systemd-sysv package.

This can be done by creating a file /etc/preferences.d/no-systemd-sysv with the following content:

Package: systemd-sysv
Pin: release o=Debian
Pin-Priority: -1

(via)

July 10, 2015

OpenSSL has been Updated (CVE-2015-1793)

The Alternative Chains Certificate Forgery vulnerability in OpenSSL, as reported in CVE-2015-1793, has been patched in CoreOS Linux (Alpha, Beta and Stable channels). If automatic updates are enabled (default configuration), your server should be patched within the next several hours (if it hasn’t already received the update).

If automatic updates are disabled, you can force an update by running update_engine_client -check_for_update.

If you have any questions or concerns, please join us in IRC freenode/#coreos.

How to get involved with CoreOS projects

Today we’re excited to build and collaborate with our community at the inaugural CoreOS hackathon. Even if you can’t join us at GopherCon in Denver, there are numerous ways to get involved and contribute.

Every project on GitHub includes helpful information on contributing that can be found in the CONTRIBUTING.md file. Be sure to look at it before you jump in and begin coding.

etcd

Serving as the backbone of many distributed systems, from Kubernetes to Pivotal’s Cloud Foundry and beyond, etcd is a Go codebase and key-value store where the state of the art in distributed systems comes together. If your interests lie in consensus protocols, APIs and clustering, etcd has a number of areas where contribution is more than welcome.

Hack on etcd.

fleet

CoreOS Linux is built for scale, and at scale, managing systemd can be a challenge. We created fleet to distribute init across the data center. fleet is used in nearly all CoreOS deployments to simplify administrative overhead. If you are interested in operational plumbing, there is no shortage of work to be done.

Hack on fleet.

flannel

If software defined networks and the low levels of data center connectivity are in your interests, you can help build flannel, the container-friendly software networking fabric.

Hack on flannel.

rkt

Jump in with the rkt team and help create a secure, composable and standards based container runtime starting with these requests. Help shape the future of the container ecosystem and stay up to date with our rkt mailing list too.

With the recent announcement to collaborate on the Open Container Project (OCP), stay tuned for updates on how we will work together and work with OCP and rkt.

GopherCon Hack Day on July 10

For any of you attending GopherCon, here are the details of the hack day:

When: Friday, July 10, 2015 from 10:00 a.m. - 5:00 p.m. MDT

Where: Room 403, Denver Convention Center

Schedule:

10:00 a.m. - 10:30 a.m. - Brandon Philips, CoreOS Two Years in

10:30 a.m. - 11:00 a.m. - Kelsey Hightower, Kubernetes talk

11 a.m. - 11:30 a.m. - Russell Haering, ScaleFT

11:30 a.m. - 12 p.m. - Micha Leuffen, Wercker

-LUNCH-

1:00 p.m. - 4:00 p.m. - Hack Day Competition

4:00 p.m. - 5:00 p.m. - Competition demos and winner announcement

We welcome your involvement and contributions to CoreOS projects. We wouldn’t be here without our contributors and there is much to be done!

July 08, 2015

OpenPower Firmware Stack

The OpenPower server platform comprises one or more Power8 processors, the latest of the IBM PowerPC family, and some kind of management controller to power on and monitor the state of the main processor(s). This post provides an overview of the different bits of open source firmware that are used to take the machine from power on all the way through to running your operating system.

Tyan Palmetto Motherboard

The Tyan GN70-BP010 is the first OpenPower machine to ship. Known also by its codename Palmetto, it contains a single Power8 processor and an Aspeed AST2400 ARM System on Chip which we refer to as the Baseboard Management Controller (BMC). The BMC is a typical embedded ARM system: u-boot, Linux kernel and stripped down userspace. This firmware is built by OpenPower Foundation member AMI.

P8 Boot

The BMC and the Power8 share a common memory mapped interface, called the LPC bus. This is the interface over which the Power8 accesses boot firmware, as well as boot time configuration, from a SPI attached PNOR flash, and speaks to the BMC’s IPMI stack over the BT interface.

Hostboot Starting

When it comes to starting the Power8 the BMC wiggles a pin to kick the SBE (Self Boot Engine) into gear. This tiny processor in the Power8 loads the first stage firmware, called Hostboot, from PNOR and configures one of the Power8 threads to execute it from L3 cache. Hostboot is responsible for bringing up the internal buses in the Power8, as well as the rest of the cores, SDRAM memory, and another on-CPU microcontroller called the OCC (On Chip Controller).

P8 Boot Flow

When Hostboot is finished the procedures it loads a payload from the PNOR. This payload is the second stage firmware, known as Skiboot. Skiboot synchronises the timebase between all the CPU threads, brings up the PCIe buses, communicates with the management controller, and provides the runtime OPAL (Open Power Abstraction Layer) interface for the operating system to call into. Skiboot is also responsible for loading the next stage bootloader, which in this case is a Linux kernel and root file system that provide the Petitboot loader environment.

Skiboot Starting

Petitboot Starting

Petitboot is a bootloader that discovers all the disks and network devices in the system, and presents a menu for the user to select which OS to run. Petiboot looks for PXE configuration information, as well as parsing Grub configuration files found on local disks. Petitboot reads configration information from the NVRAM partition on the PNOR, which means it can be configured to boot from a specific network interface, hard drive, or even not boot at all and wait for user input. Once the boot OS has been selected, Petitboot uses the Linux kexec functionality to jump into the host kernel.

PetitbootPetitboot Menu

July 07, 2015

Happy 2nd Epoch CoreOS Linux

Our CoreOS Linux version numbers are counted from our epoch on July 1, 2013, which means this month marks the end of our second year working on CoreOS Linux.

Today we rolled out the 735.0.0 release of CoreOS Linux to the alpha channel. Our CoreOS Linux version numbers are counted from our epoch on July 1, 2013, which means this month marks the end of our second year working on CoreOS Linux.

Two years ago we started this journey with a vision of improving the consistency, deployment speed and security of server infrastructure. In this time we have kicked off a rethinking of how server OSes are designed and used. In a recent article InfoWorld said:

CoreOS Linux “was the first influential micro operating system designed for today’s cloud environments.”

Last year, we celebrated our first stable channel release and since then we have been hard at work pushing important bug fixes and feature releases to that channel every 2.5 weeks on average.

CoreOS Year 1 Highlights

In the post for that first stable release we highlighted our progress to date:

  • CoreOS engineers contributed features and fixes to open source projects including Docker, the Linux kernel, networkd, systemd and more
  • Official CoreOS image added to Google Compute Engine, Rackspace, Amazon
  • Joined the Docker Governance Board as a Contributing Member
  • Today’s most respected technology companies and many Fortune 500 companies are using and testing CoreOS in their environments

CoreOS Year 2 Highlights

In the tradition of that post one year ago, let’s take a look at some of the highlights from the last year of CoreOS.

  • Announced Tectonic, a commercial Kubernetes platform that combines the CoreOS stack with Kubernetes to bring companies Google-style infrastructure
  • Worked with community partners to create App Container (appc), a specification defining a container image format, runtime environment and discovery protocol, to work towards the goal of a standard, portable shipping container for applications
  • Created rkt, a container runtime designed for composability, security and speed and the first implementation of appc
  • Quay.io joined CoreOS to provide Enterprise Registry, delivering secured hosting of your container repositories behind the firewall
  • Released etcd 2.0, which powers important projects in the container and distributed systems ecosystem including the flannel, locksmith, fleet and Kubernetes projects. etcd also supports community projects like HashiCorp’s Vault, and Docker 1.7s networking backend.
  • Joined forces with industry leaders to launch the Open Container Project, chartered to establish common standards for software containers

Our ability to build and ship innovative and high-quality projects is due in large part to the feedback and interest from our community. Thank you for all of your help in contributing, bug testing, promoting and learning more about what we are doing here at CoreOS.

Celebrate With Us at GopherCon

We will be celebrating our second birthday with our friends at GopherCon in Denver. Swing by our booth to get a limited edition CoreOS GopherCon sticker. Or, join us at our birthday party, brought to you by our friends from Couchbase and Iron.io on Thursday, July 9, at 8 p.m. MDT. Lastly, don’t miss our hack day on Friday, July 10, where you can work alongside a CoreOS engineer, learn about our open source projects and compete for prizes.

RSVP for our Second Birthday Party

Thursday, July 9 at 8 - 11 p.m. MDT

Pizza Republica in Denver, Colorado. Sponsored by Couchbase and Iron.io.

http://www.eventbrite.com/e/coreos-2nd-birthday-at-gophercon-tickets-17419178231

CoreOS Birthday Hack Day

Friday, July 10 at 10 a.m. - 5 p.m. MDT

Room 403, GopherCon in Denver, Colorado

http://www.gophercon.com/

July 06, 2015

Upcoming CoreOS Events in July

Need your CoreOS fix? Check out where you can find us this month!


Tuesday, July 7-Friday, July 10, 2015 - Denver, CO

We’re going to GopherCon! Be sure to stop by our booth to pick up some swag and say hello.


Tuesday, July 7, 2015 at 6:00 p.m. MDT - Denver, CO

Start your GopherCon experience the right way. Join Brian “Redbeard” Harrington and other awesome speakers at the GopherCon Kick off party!


Thursday, July 9, 2015 at 1:40 p.m. MDT - Denver, CO

Be sure to check out Barak Michener give a talk at GopherCon about Cayley and building a graph database.


Thursday, July 9, 2015 at 4:00 p.m. MDT - Denver, CO

Don’t miss Kelsey Hightower at GopherCon talking about betting the company on go and winning!


Thursday, July 9, 2015 at 8:00 p.m. MDT - Denver, CO

If you’re attending GopherCon, come celebrate our second birthday with us and our friends from Couchbase and Iron.io at Pizza Republica! Pizza, beer and video games included. RSVP here!


Friday, July 10, 2015 - 11:00 a.m. BRT - Porto Alegre, Brazil

Meet Matthew Garrett and discuss Free Software communities at FISL in Brazil. He’ll present, Using DRM technologies to protect users.


Friday, July 10, 2015 at 10:00 a.m. MDT - Denver, CO

End GopherCon with a good time! Join us at our Hack Day in room 403. We’ll have speakers from CoreOS and the community, as well as a special Hack Day competition.


Tuesday, July 14, 2015 at 6:00 p.m. IDT - Tel Aviv, Israel

If you’re in Tel Aviv, swing by the Docker Tel Aviv Meetup to hear Joey Schorr discuss the Quay.io container lifecylce.


Tuesday, July 14, 2015 at 5:00 p.m. EDT - Online

DataStax is hosting a webinar on leveraging Docker and CoreOS to provide always available Cassandra at Instaclustr. Register here!


Tuesday, July 21, 2015 at 10:00 a.m. PDT - Portland, OR

Join us as we celebrate Kubernetes 1.0! Come by in person at the event or after party if you’re at OSCON. If you can’t make it to Portland, not to worry. Register to watch the keynote here.


Tuesday, July 21, 2015 at 1:30 p.m. PDT - Portland, OR

Kelsey Hightower will be at OSCON giving a workshop on taming microservices with CoreOS and Kubernetes. Don’t miss it!


Thursday, July 23, 2015 at 7:00 p.m. BST - London, UK

We’re ending the month with our friends at the CoreOS London Meetup! Come hang out and learn more about Tectonic and how it combines Kubernetes and the CoreOS software portfolio.


Want more CoreOS in your city? Let us know! email us at community@coreos.com.

July 05, 2015

July 01, 2015

Introducing flannel 0.5.0 with AWS and GCE

Last week we released flannel v0.5, a virtual network that gives a range of IP addresses to each host to use with container runtimes. We have been working hard to add features to flannel to enable a wider variety of use cases, such as taking advantage of cloud providers' networking capabilities, as part of the goal to enable containers to effectively communicate across networks and ensure they are easily portable across cloud providers.

With this in mind, flannel v0.5 includes the following new features:

  • support for Google Compute Engine (GCE),
  • a client/server mode and,
  • a multi-network mode.

Please refer to the readme for details on the client/server and the multi-network modes.

Try Out the New Release

In this post we will provide an overview of how to setup flannel on Amazon Virtual Private Cloud (Amazon VPC) backend introduced in flannel v0.4 and the newly added GCE backend.

When flannel runs the gce or the aws-vpc backend it does not create a separate interface as it does when running the udp or the vxlan backends.

This is because with gce and aws-vpc backends, there is no overlay or encapsulation and flannel simply manipulates the IP routes to achieve maximum performance.

Let’s get started with setting up flannel on GCE instances.

GCE Backend

From the Developers Console, we start by creating a new network.

Configure the network name and address range. Then add firewall rules to allow etcd traffic (tcp/2379), SSH, and ICMP. That's it for the network configuration. Now it’s time to create an instance. Let's call it demo-instance-1. Under the "Management, disk, networking, access & security options" make the following changes:

  • Select the "Network" to be our newly created network
  • Enable IP forwarding
  • Under "Access and Security" set the compute permissions to "Read Write" and remember to add your SSH key



New GCE Instance
Booting a new GCE instance
Security settings for a new instance
Security settings for a new instance

With the permissions set, we can launch the instance!

The only remaining steps now are to start etcd, publish the network configuration and lastly, run the flannel daemon. SSH into demo-instance-1 and execute the following steps:

  • Start etcd:
$ etcd2 -advertise-client-urls http://$INTERNAL_IP:2379 -listen-client-urls http://0.0.0.0:2379
  • Publish configuration in etcd (ensure that the network range does not overlap with the one configured for the GCE network)
$ etcdctl set /coreos.com/network/config '{"Network":"10.40.0.0/16", "Backend": {"Type": "gce"}}'
  • Fetch the 0.5 release using wget from here
  • Run flannel daemon:
$ sudo ./flanneld --etcd-endpoints=http://127.0.0.1:2379

Now make a clone of demo-instance-1 and SSH into it to run the these steps:

  • Fetch the 0.5 release as before.
  • Run flannel with the --etcd-endpoints flag set to the internal IP of the instance running etcd

Check that the subnet lease acquired by each of the hosts has been added!



GCE Routes

It’s important to note that GCE currently limits the number of routes per project to 100.

Amazon VPC Backend

In order to run flannel on AWS we need to first create an Amazon VPC. Amazon VPC enables us to launch EC2 instances into a virtual network, which we can configure via its route table.

From the VPC dashboard start out by running the "VPC Wizard":

  • Select "VPC with a Single Public Subnet"
  • Configure the network and the subnet address ranges



Creating a new Amazon VPC

Now that we have set up our VPC and subnet, let’s create an Identity and Access Management (IAM) role to grant the required permissions to our EC2 instances.

From the console, select Services -> Administration & Security -> IAM.

We first need to create a policy that we will later assign to an IAM role. Under "Create Policy" select the "Create Your Own Policy" option. The following permissions are required as shown below in the sample policy document.

  • ec2:CreateRoute
  • ec2:DeleteRoute
  • ec2:ReplaceRoute
  • ec2:DescribeRouteTables
  • ec2:DescribeInstances
{
    "Version": "2012-10-17",
    "Statement": [
    {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateRoute",
                "ec2:DeleteRoute",
                "ec2:ReplaceRoute"
            ],
            "Resource": [
                "*"
            ]
    },
    {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeRouteTables",
                "ec2:DescribeInstances"
            ],
            "Resource": "*"
    }
    ]
}

Note that although the first three permissions can be tied to the route table resource of our subnet, the ec2:Describe* permissions can not be limited to a particular resource. For simplicity, we leave the "Resource" as wildcard in both.

With the policy added, let's attach it to a new IAM role by clicking the "Create New Role" button and setting the following options:

  • Role Name: demo-role
  • Role Type: "Amazon EC2"
  • Attach the policy we created earlier

We are now all set to launch an EC2 instance. In the launch wizard, choose the CoreOS-stable-681.2.0 image and under "Configure Instance Details" perform the following steps:

  • Change the "Network" to the VPC we just created
  • Enable "Auto-assign Public IP"
  • Select IAM demo-role



Configuring AWS EC2 instance details

Under the "Configure Security Group" tab add the rules to allow etcd traffic (tcp/2379), SSH and ICMP.

Go ahead and launch the instance!

Since our instance will be sending and receiving traffic for IPs other than the one assigned by our subnet, we need to disable source/destination checks.



Disable AWS Source/Dest Check

All that’s left now is to start etcd, publish the network configuration and run the flannel daemon. First, SSH into demo-instance-1:

  • Start etcd:
$ etcd2 -advertise-client-urls http://$INTERNAL_IP:2379 -listen-client-urls http://0.0.0.0:2379
  • Publish configuration in etcd (ensure that the network range does not overlap with the one configured for the VPC)
$ etcdctl set /coreos.com/network/config '{"Network":"10.20.0.0/16", "Backend": {"Type": "aws-vpc"}}'
  • Fetch the latest release using wget from here
  • Run flannel daemon:
sudo ./flanneld --etcd-endpoints=http://127.0.0.1:2379

Next, create and connect to a clone of demo-instance-1. Run flannel with the --etcd-endpoints flag set to the internal IP of the instance running etcd.

Confirm that the subnet route table has entries for the lease acquired by each of the subnets.



AWS Routes

Keep in mind that the Amazon VPC limits the number of entries per route table to 50.

Note that these are just sample configurations, so feel free to try it out and set up what works best for you!

June 29, 2015

In Practice, What is the C Language, Really?

The official definition of the C Language is the standard, but the standard doesn't actually compile any programs. One can argue that the actual implementations are the real definition of the C Language, although further thought along this line usually results in a much greater appreciation of the benefits of having standards. Nevertheless, the implementations usually win any conflicts with the standard, at least in the short term.



Another interesting source of definitions is the opinions of the developers who actually write C. And both the standards bodies and the various implementations do take these opinions into account at least some of the time. Differences of opinion within the standards bodies are sometimes settled by surveying existing usage, and implementations sometimes provide facilities outside the standard based on user requests. For example, relatively few compiler warnings are actually mandated by the standard.



Although one can argue that the standard is the end-all and be-all definition of the C Language, the fact remains that if none of the implementers provide a facility called out by the standard, the implementers win. Similarly, if nobody uses a facility that is called out by the standard, the users win—even if that facility is provided by each and every implementation. Of course, things get more interesting if the users want something not guaranteed by the standard.



Therefore, it is worth knowing what users expect, even if only to adjust their expectations, as John Regehr has done for number of topics, perhaps most notably signed integer overflow. Some researchers have been taking a more proactive stance, with one example being Peter Sewell's group from the University of Cambridge. This group has put together a survey on padding bytes, pointer arithmetic, and unions. This survey is quite realistic, with “that would be crazy” being a valid answer to a number of the questions.



So, if you think you know a thing or two about C's handling of padding bytes, pointer arithmetic, and unions, take the survey!

June 28, 2015

RAID Pain

One of my clients has a NAS device. Last week they tried to do what should have been a routine RAID operation, they added a new larger disk as a hot-spare and told the RAID array to replace one of the active disks with the hot-spare. The aim was to replace the disks one at a time to grow the array. But one of the other disks had an error during the rebuild and things fell apart.

I was called in after the NAS had been rebooted when it was refusing to recognise the RAID. The first thing that occurred to me is that maybe RAID-5 isn’t a good choice for the RAID. While it’s theoretically possible for a RAID rebuild to not fail in such a situation (the data that couldn’t be read from the disk with an error could have been regenerated from the disk that was being replaced) it seems that the RAID implementation in question couldn’t do it. As the NAS is running Linux I presume that at least older versions of Linux have the same problem. Of course if you have a RAID array that has 7 disks running RAID-6 with a hot-spare then you only get the capacity of 4 disks. But RAID-6 with no hot-spare should be at least as reliable as RAID-5 with a hot-spare.

Whenever you recover from disk problems the first thing you want to do is to make a read-only copy of the data. Then you can’t make things worse. This is a problem when you are dealing with 7 disks, fortunately they were only 3TB disks and only each had 2TB in use. So I found some space on a ZFS pool and bought a few 6TB disks which I formatted as BTRFS filesystems. For this task I only wanted filesystems that support snapshots so I could work on snapshots not on the original copy.

I expect that at some future time I will be called in when an array of 6+ disks of the largest available size fails. This will be a more difficult problem to solve as I don’t own any system that can handle so many disks.

I copied a few of the disks to a ZFS filesystem on a Dell PowerEdge T110 running kernel 3.2.68. Unfortunately that system seems to have a problem with USB, when copying from 4 disks at once each disk was reading about 10MB/s and when copying from 3 disks each disk was reading about 13MB/s. It seems that the system has an aggregate USB bandwidth of 40MB/s – slightly greater than USB 2.0 speed. This made the process take longer than expected.

One of the disks had a read error, this was presumably the cause of the original RAID failure. dd has the option conv=noerror to make it continue after a read error. This initially seemed good but the resulting file was smaller than the source partition. It seems that conv=noerror doesn’t seek the output file to maintain input and output alignment. If I had a hard drive filled with plain ASCII that MIGHT even be useful, but for a filesystem image it’s worse than useless. The only option was to repeatedly run dd with matching skip and seek options incrementing by 1K until it had passed the section with errors.

for n in /dev/loop[0-6] ; do echo $n ; mdadm –examine -v -v –scan $n|grep Events ; done

Once I had all the images I had to assemble them. The Linux Software RAID didn’t like the array because not all the devices had the same event count. The way Linux Software RAID (and probably most RAID implementations) work is that each member of the array has an event counter that is incremented when disks are added, removed, and when data is written. If there is an error then after a reboot only disks with matching event counts will be used. The above command shows the Events count for all the disks.

Fortunately different event numbers aren’t going to stop us. After assembling the array (which failed to run) I ran “mdadm -R /dev/md1” which kicked some members out. I then added them back manually and forced the array to run. Unfortunately attempts to write to the array failed (presumably due to mismatched event counts).

Now my next problem is that I can make a 10TB degraded RAID-5 array which is read-only but I can’t mount the XFS filesystem because XFS wants to replay the journal. So my next step is to buy another 2*6TB disks to make a RAID-0 array to contain an image of that XFS filesystem.

Finally backups are a really good thing…

June 27, 2015

git.openstack.org adventures

Over the past few months I started to notice occasional issues when cloning repositories (particularly nova) from git.openstack.org.

It would fail with something like

git clone -vvv git://git.openstack.org/openstack/nova .
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

The problem would occur sporadically during our 3rd party CI runs causing them to fail. Initially these went somewhat ignored as rechecks on the jobs would succeed and the world would be shiny again. However, as they became more prominent the issue needed to be addressed.

When a patch merges in gerrit it is replicated out to 5 different cgit backends (git0[1-5].openstack.org). These are then balanced by two HAProxy frontends which are on a simple DNS round-robin.

                          +-------------------+
                          | git.openstack.org |
                          |    (DNS Lookup)   |
                          +--+-------------+--+
                             |             |
                    +--------+             +--------+
                    |           A records           |
+-------------------v----+                    +-----v------------------+
| git-fe01.openstack.org |                    | git-fe02.openstack.org |
|   (HAProxy frontend)   |                    |   (HAProxy frontend)   |
+-----------+------------+                    +------------+-----------+
            |                                              |
            +-----+                                    +---+
                  |                                    |
            +-----v------------------------------------v-----+
            |    +---------------------+  (source algorithm) |
            |    | git01.openstack.org |                     |
            |    |   +---------------------+                 |
            |    +---| git02.openstack.org |                 |
            |        |   +---------------------+             |
            |        +---| git03.openstack.org |             |
            |            |   +---------------------+         |
            |            +---| git04.openstack.org |         |
            |                |   +---------------------+     |
            |                +---| git05.openstack.org |     |
            |                    |  (HAProxy backend)  |     |
            |                    +---------------------+     |
            +------------------------------------------------+

Reproducing the problem was difficult. At first I was unable to reproduce locally, or even on an isolated turbo-hipster run. Since the problem appeared to be specific to our 3rd party tests (little evidence of it in 1st party runs) I started by adding extra debugging output to git.

We were originally cloning repositories via the git:// protocol. The debugging information was unfortunately limited and provided no useful diagnosis. Switching to https allowed for more CURL output (when using GIT_CURL_VERBVOSE=1 and GIT_TRACE=1) but this in itself just created noise. It actually took me a few days to remember that the servers are running arbitrary code anyway (a side effect of testing) and therefore cloning from the potentially insecure http protocol didn’t provide any further risk.

Over http we got a little more information, but still nothing that was conclusive at this point:

git clone -vvv http://git.openstack.org/openstack/nova .

error: RPC failed; result=18, HTTP code = 200
fatal: The remote end hung up unexpectedly
fatal: protocol error: bad pack header

After a bit it became more apparent that the problems would occur mostly during high (patch) traffic times. That is, when a lot of tests need to be queued. This lead me to think that either the network turbo-hipster was on was flaky when doing multiple git clones in parallel or the git servers were flaky. The lack of similar upstream failures lead me to initially think it was the former. In order to reproduce I decided to use Ansible to do multiple clones of repositories and see if that would uncover the problem. If needed I would have then extended this to orchestrating other parts of turbo-hipster in case the problem was systemic of something else.

Firstly I need to clone from a bunch of different servers at once to simulate the network failures more closely (rather than doing multiple clones on the one machine or from the one IP in containers for example). To simplify this I decided to learn some Ansible to launch a bunch of nodes on Rackspace (instead of doing it by hand).

Using the pyrax module I put together a crude playbook to launch a bunch of servers. There is likely much neater and better ways of doing this, but it suited my needs. The playbook takes care of placing appropriate sshkeys so I could continue to use them later.

    ---
    - name: Create VMs
      hosts: localhost
      vars:
        ssh_known_hosts_command: "ssh-keyscan -H -T 10"
        ssh_known_hosts_file: "/root/.ssh/known_hosts"
      tasks:
        - name: Provision a set of instances
          local_action:
            module: rax
            name: "josh-testing-ansible"
            flavor: "4"
            image: "Ubuntu 12.04 LTS (Precise Pangolin) (PVHVM)"
            region: "DFW"
            count: "15"
            group: "raxhosts"
            wait: yes
          register: raxcreate

        - name: Add the instances we created (by public IP) to the group 'raxhosts'
          local_action:
            module: add_host
            hostname: "{{ item.name }}"
            ansible_ssh_host: "{{ item.rax_accessipv4 }}"
            ansible_ssh_pass: "{{ item.rax_adminpass }}"
            groupname: raxhosts
          with_items: raxcreate.success
          when: raxcreate.action == 'create'

        - name: Sleep to give time for the instances to start ssh
          #there is almost certainly a better way of doing this
          pause: seconds=30

        - name: Scan the host key
          shell: "{{ ssh_known_hosts_command}} {{ item.rax_accessipv4 }} &gt;&gt; {{ ssh_known_hosts_file }}"
          with_items: raxcreate.success
          when: raxcreate.action == 'create'

    - name: Set up sshkeys
      hosts: raxhosts
      tasks:
       - name: Push root's pubkey
         authorized_key: user=root key="{{ lookup('file', '/root/.ssh/id_rsa.pub') }}"

From here I can use Ansible to work on those servers using the rax inventory. This allows me to address any nodes within my tenant and then log into them with the seeded sshkey.

The next step of course was to run tests. Firstly I just wanted to reproduce the issue, so in order to do that it would crudely set up an environment where it can simply clone nova multiple times.

    ---
    - name: Prepare servers for git testing
      hosts: josh-testing-ansible*
      serial: "100%"
      tasks:
        - name: Install git
          apt: name=git state=present update_cache=yes
        - name: remove nova if it is already cloned
          shell: 'rm -rf nova'

    - name: Clone nova and monitor tcpdump
      hosts: josh-testing-ansible*
      serial: "100%"
      tasks:
        - name: Clone nova
          shell: "git clone http://git.openstack.org/openstack/nova"

By default Ansible runs with 5 folked processes. Meaning that Ansible would work on 5 servers at a time. We want to exercise git heavily (in the same way turbo-hipster does) so we use the –forks param to run the clone on all the servers at once. The plan was to keep launching servers until the error reared its head from the load.

To my surprise this happened with very few nodes (less than 15, but I left that as my minimum testing). To confirm I also ran the tests after launching further nodes to see it fail at 50 and 100 concurrent clones. It turned out that the more I cloned the higher the failure rate percentage was.

Now that I had the problem reproducing, it was time to do some debugging. I modified the playbook to capture tcpdump information during the clone. Initially git was cloning over IPv6 so I turned that off on the nodes to force IPv4 (just in case it was a v6 issue, but the problem did present itself on both networks). I also locked git.openstack.org to one IP rather than randomly hitting both front ends.

    ---
    - name: Prepare servers for git testing
      hosts: josh-testing-ansible*
      serial: "100%"
      tasks:
        - name: Install git
          apt: name=git state=present update_cache=yes
        - name: remove nova if it is already cloned
          shell: 'rm -rf nova'

    - name: Clone nova and monitor tcpdump
      hosts: josh-testing-ansible*
      serial: "100%"
      vars:
        cap_file: tcpdump_{{ ansible_hostname }}_{{ ansible_date_time['epoch'] }}.cap
      tasks:
        - name: Disable ipv6 1/3
          sysctl: name="net.ipv6.conf.all.disable_ipv6" value=1 sysctl_set=yes
        - name: Disable ipv6 2/3
          sysctl: name="net.ipv6.conf.default.disable_ipv6" value=1 sysctl_set=yes
        - name: Disable ipv6 3/3
          sysctl: name="net.ipv6.conf.lo.disable_ipv6" value=1 sysctl_set=yes
        - name: Restart networking
          service: name=networking state=restarted
        - name: Lock git.o.o to one host
          lineinfile: dest=/etc/hosts line='23.253.252.15 git.openstack.org' state=present
        - name: start tcpdump
          command: "/usr/sbin/tcpdump -i eth0 -nnvvS -w /tmp/{{ cap_file }}"
          async: 6000000
          poll: 0 
        - name: Clone nova
          shell: "git clone http://git.openstack.org/openstack/nova"
          #shell: "git clone http://github.com/openstack/nova"
          ignore_errors: yes
        - name: kill tcpdump
          command: "/usr/bin/pkill tcpdump"
        - name: compress capture file
          command: "gzip {{ cap_file }} chdir=/tmp"
        - name: grab captured file
          fetch: src=/tmp/{{ cap_file }}.gz dest=/var/www/ flat=yes

This gave us a bunch of compressed capture files that I was then able to seek the help of my colleagues to debug (a particular thanks to Angus Lees). The results from an early run can be seen here: http://119.9.51.216/old/run1/

Gus determined that the problem was due to a RST packet coming from the source at roughly 60 seconds. This indicated it was likely we were hitting a timeout at the server or a firewall during the git-upload-pack of the clone.

The solution turned out to be rather straight forward. The git-upload-pack had simply grown too large and would timeout depending on the load on the servers. There was a timeout in apache as well as the HAProxy config for both frontend and backend responsiveness. The relative patches can be found at https://review.openstack.org/#/c/192490/ and https://review.openstack.org/#/c/192649/

While upping the timeout avoids the problem, certain projects are clearly pushing the infrastructure to its limits. As such a few changes were made by the infrastructure team (in particular James Blair) to improve git.openstack.org’s responsiveness.

Firstly git.openstack.org is now a higher performance (30GB) instance. This is a large step up from the previous (8GB) instances that were used as the frontend previously. Moving to one frontend additionally meant the HAProxy algorithm could be changed to leastconn to help balance connections better (https://review.openstack.org/#/c/193838/).

                          +--------------------+
                          | git.openstack.org  |
                          | (HAProxy frontend) |
                          +----------+---------+
                                     |
                                     |
            +------------------------v------------------------+
            |  +---------------------+  (leastconn algorithm) |
            |  | git01.openstack.org |                        |
            |  |   +---------------------+                    |
            |  +---| git02.openstack.org |                    |
            |      |   +---------------------+                |
            |      +---| git03.openstack.org |                |
            |          |   +---------------------+            |
            |          +---| git04.openstack.org |            |
            |              |   +---------------------+        |
            |              +---| git05.openstack.org |        |
            |                  |  (HAProxy backend)  |        |
            |                  +---------------------+        |
            +-------------------------------------------------+

All that was left was to see if things had improved. I rerun the test across 15, 30 and then 45 servers. These were all able to clone nova reliably where they had previously been failing. I then upped it to 100 servers where the cloning began to fail again.

Post-fix logs for those interested:

http://119.9.51.216/run15/

http://119.9.51.216/run30/

http://119.9.51.216/run45/

http://119.9.51.216/run100/

http://119.9.51.216/run15per100/

At this point, however, I’m basically performing a Distributed Denial of Service attack against git. As such, while the servers aren’t immune to a DDoS the problem appears to be fixed.

June 24, 2015

Smart Phones Should Measure Charge Speed

My first mobile phone lasted for days between charges. I never really found out how long it’s battery would last because there was no way that I could use it to deplete the charge in any time that I could spend awake. Even if I had managed to run the battery out the phone was designed to accept 4*AA batteries (it’s rechargeable battery pack was exactly that size) so I could buy spare batteries at any store.

Modern phones are quite different in physical phone design (phones that weigh less than 4*AA batteries aren’t uncommon), functionality (fast CPUs and big screens suck power), and use (games really drain your phone battery). This requires much more effective chargers, when some phones are intensively used (EG playing an action game with Wifi enabled) they can’t be charged as they use more power than the plug-pack supplies. I’ve previously blogged some calculations about resistance and thickness of wires for phone chargers [1], it’s obvious that there are some technical limitations to phone charging based on the decision to use a long cable at ~5V.

My calculations about phone charge rate were based on the theoretical resistance of wires based on their estimated cross-sectional area. One problem with such analysis is that it’s difficult to determine how thick the insulation is without destroying the wire. Another problem is that after repeated use of a charging cable some conductors break due to excessive bending. This can significantly increase the resistance and therefore increase the charging time. Recently a charging cable that used to be really good suddenly became almost useless. My Galaxy Note 2 would claim that it was being charged even though the reported level of charge in the battery was not increasing, it seems that the cable only supplied enough power to keep the phone running not enough to actually charge the battery.

I recently bought a USB current measurement device which is really useful. I have used it to diagnose power supplies and USB cables that didn’t work correctly. But one significant way in which it fails is in the case of problems with the USB connector. Sometimes a cable performs differently when connected via the USB current measurement device.

The CurrentWidget program [2] on my Galaxy Note 2 told me that all of the dedicated USB chargers (the 12V one in my car and all the mains powered ones) supply 1698mA (including the ones rated at 1A) while a PC USB port supplies ~400mA. I don’t think that the Note 2 measurement is particularly reliable. On my Galaxy Note 3 it always says 0mA, I guess that feature isn’t implemented. An old Galaxy S3 reports 999mA of charging even when the USB current measurement device says ~500mA. It seems to me that method the CurrentWidget uses to get the current isn’t accurate if it even works at all.

Android 5 on the Nexus 4/5 phones will tell the amount of time until the phone is charged in some situations (on the Nexus 4 and Nexus 5 that I used for testing it didn’t always display it and I don’t know why). This is an useful but it’s still not good enough.

I think that what we need is to have the phone measure the current that’s being supplied and report it to the user. Then when a phone charges slowly because apps are using some power that won’t be mistaken for a phone charging slowly due to a defective cable or connector.

June 23, 2015

One Android Phone Per Child

I was asked for advice on whether children should have access to smart phones, it’s an issue that many people are discussing and seems worthy of a blog post.

Claimed Problems with Smart Phones

The first thing that I think people should read is this XKCD post with quotes about the demise of letter writing from 99+ years ago [1]. Given the lack of evidence cited by people who oppose phone use I think we should consider to what extent the current concerns about smart phone use are just reactions to changes in society. I’ve done some web searching for reasons that people give for opposing smart phone use by kids and addressed the issues below.

Some people claim that children shouldn’t get a phone when they are so young that it will just be a toy. That’s interesting given the dramatic increase in the amount of money spent on toys for children in recent times. It’s particularly interesting when parents buy game consoles for their children but refuse mobile phone “toys” (I know someone who did this). I think this is more of a social issue regarding what is a suitable toy than any real objection to phones used as toys. Obviously the educational potential of a mobile phone is much greater than that of a game console.

It’s often claimed that kids should spend their time reading books instead of using phones. When visiting libraries I’ve observed kids using phones to store lists of books that they want to read, this seems to discredit that theory. Also some libraries have Android and iOS apps for searching their catalogs. There are a variety of apps for reading eBooks, some of which have access to many free books but I don’t expect many people to read novels on a phone.

Cyber-bullying is the subject of a lot of anxiety in the media. At least with cyber-bullying there’s an electronic trail, anyone who suspects that their child is being cyber-bullied can check that while old-fashioned bullying is more difficult to track down. Also while cyber-bullying can happen faster on smart phones the victim can also be harassed on a PC. I don’t think that waiting to use a PC and learn what nasty thing people are saying about you is going to be much better than getting an instant notification on a smart phone. It seems to me that the main disadvantage of smart phones in regard to cyber-bullying is that it’s easier for a child to participate in bullying if they have such a device. As most parents don’t seem concerned that their child might be a bully (unfortunately many parents think it’s a good thing) this doesn’t seem like a logical objection.

Fear of missing out (FOMO) is claimed to be a problem, apparently if a child has a phone then they will want to take it to bed with them and that would be a bad thing. But parents could have a policy about when phones may be used and insist that a phone not be taken into the bedroom. If it’s impossible for a child to own a phone without taking it to bed then the parents are probably dealing with other problems. I’m not convinced that a phone in bed is necessarily a bad thing anyway, a phone can be used as an alarm clock and instant-message notifications can be turned off at night. When I was young I used to wait until my parents were asleep before getting out of bed to use my PC, so if smart-phones were available when I was young it wouldn’t have changed my night-time computer use.

Some people complain that kids might use phones to play games too much or talk to their friends too much. What do people expect kids to do? In recent times the fear of abduction has led to children doing playing outside a lot less, it used to be that 6yos would play with other kids in their street and 9yos would be allowed to walk to the local park. Now people aren’t allowing 14yo kids walk to the nearest park alone. Playing games and socialising with other kids has to be done over the Internet because kids aren’t often allowed out of the house. Play and socialising are important learning experiences that have to happen online if they can’t happen offline.

Apps can be expensive. But it’s optional to sign up for a credit card with the Google Play store and the range of free apps is really good. Also the default configuration of the app store is to require a password entry before every purchase. Finally it is possible to give kids pre-paid credit cards and let them pay for their own stuff, such pre-paid cards are sold at Australian post offices and I’m sure that most first-world countries have similar facilities.

Electronic communication is claimed to be somehow different and lesser than old-fashioned communication. I presume that people made the same claims about the telephone when it first became popular. The only real difference between email and posted letters is that email tends to be shorter because the reply time is smaller, you can reply to any questions in the same day not wait a week for a response so it makes sense to expect questions rather than covering all possibilities in the first email. If it’s a good thing to have longer forms of communication then a smart phone with a big screen would be a better option than a “feature phone”, and if face to face communication is preferred then a smart phone with video-call access would be the way to go (better even than old fashioned telephony).

Real Problems with Smart Phones

The majority opinion among everyone who matters (parents, teachers, and police) seems to be that crime at school isn’t important. Many crimes that would result in jail sentences if committed by adults receive either no punishment or something trivial (such as lunchtime detention) if committed by school kids. Introducing items that are both intrinsically valuable and which have personal value due to the data storage into a typical school environment is probably going to increase the amount of crime. The best options to deal with this problem are to prevent kids from taking phones to school or to home-school kids. Fixing the crime problem at typical schools isn’t a viable option.

Bills can potentially be unexpectedly large due to kids’ inability to restrain their usage and telcos deliberately making their plans tricky to profit from excess usage fees. The solution is to only use pre-paid plans, fortunately many companies offer good deals for pre-paid use. In Australia Aldi sells pre-paid credit in $15 increments that lasts a year [2]. So it’s possible to pay $15 per year for a child’s phone use, have them use Wifi for data access and pay from their own money if they make excessive calls. For older kids who need data access when they aren’t at home or near their parents there are other pre-paid phone companies that offer good deals, I’ve previously compared prices of telcos in Australia, some of those telcos should do [3].

It’s expensive to buy phones. The solution to this is to not buy new phones for kids, give them an old phone that was used by an older relative or buy an old phone on ebay. Also let kids petition wealthy relatives for a phone as a birthday present. If grandparents want to buy the latest smart-phone for a 7yo then there’s no reason to stop them IMHO (this isn’t a hypothetical situation).

Kids can be irresponsible and lose or break their phone. But the way kids learn to act responsibly is by practice. If they break a good phone and get a lesser phone as a replacement or have to keep using a broken phone then it’s a learning experience. A friend’s son head-butted his phone and cracked the screen – he used it for 6 months after that, I think he learned from that experience. I think that kids should learn to be responsible with a phone several years before they are allowed to get a “learner’s permit” to drive a car on public roads, which means that they should have their own phone when they are 12.

I’ve seen an article about a school finding that tablets didn’t work as well as laptops which was touted as news. Laptops or desktop PCs obviously work best for typing. Tablets are for situations where a laptop isn’t convenient and when the usage involves mostly reading/watching, I’ve seen school kids using tablets on excursions which seems like a good use of them. Phones are even less suited to writing than tablets. This isn’t a problem for phone use, you just need to use the right device for each task.

Phones vs Tablets

Some people think that a tablet is somehow different from a phone. I’ve just read an article by a parent who proudly described their policy of buying “feature phones” for their children and tablets for them to do homework etc. Really a phone is just a smaller tablet, once you have decided to buy a tablet the choice to buy a smart phone is just about whether you want a smaller version of what you have already got.

The iPad doesn’t appear to be able to make phone calls (but it supports many different VOIP and video-conferencing apps) so that could technically be described as a difference. AFAIK all Android tablets that support 3G networking also support making and receiving phone calls if you have a SIM installed. It is awkward to use a tablet to make phone calls but most usage of a modern phone is as an ultra portable computer not as a telephone.

The phone vs tablet issue doesn’t seem to be about the capabilities of the device. It’s about how portable the device should be and the image of the device. I think that if a tablet is good then a more portable computing device can only be better (at least when you need greater portability).

Recently I’ve been carrying a 10″ tablet around a lot for work, sometimes a tablet will do for emergency work when a phone is too small and a laptop is too heavy. Even though tablets are thin and light it’s still inconvenient to carry, the issue of size and weight is a greater problem for kids. 7″ tablets are a lot smaller and lighter, but that’s getting close to a 5″ phone.

Benefits of Smart Phones

Using a smart phone is good for teaching children dexterity. It can also be used for teaching art in situations where more traditional art forms such as finger painting aren’t possible (I have met a professional artist who has used a Samsung Galaxy Note phone for creating art work).

There is a huge range of educational apps for smart phones.

The Wikireader (that I reviewed 4 years ago) [4] has obvious educational benefits. But a phone with Internet access (either 3G or Wifi) gives Wikipedia access including all pictures and is a better fit for most pockets.

There are lots of educational web sites and random web sites that can be used for education (Googling the answer to random questions).

When it comes to preparing kids for “the real world” or “the work environment” people often claim that kids need to use Microsoft software because most companies do (regardless of the fact that most companies will be using radically different versions of MS software by the time current school kids graduate from university). In my typical work environment I’m expected to be able to find the answer to all sorts of random work-related questions at any time and I think that many careers have similar expectations. Being able to quickly look things up on a phone is a real work skill, and a skill that’s going to last a lot longer than knowing today’s version of MS-Office.

There are a variety of apps for tracking phones. There are non-creepy ways of using such apps for monitoring kids. Also with two-way monitoring kids will know when their parents are about to collect them from an event and can stay inside until their parents are in the area. This combined with the phone/SMS functionality that is available on feature-phones provides some benefits for child safety.

iOS vs Android

Rumour has it that iOS is better than Android for kids diagnosed with Low Functioning Autism. There are apparently apps that help non-verbal kids communicate with icons and for arranging schedules for kids who have difficulty with changes to plans. I don’t know anyone who has a LFA child so I haven’t had any reason to investigate such things. Anyone can visit an Apple store and a Samsung Experience store as they have phones and tablets you can use to test out the apps (at least the ones with free versions). As an aside the money the Australian government provides to assist Autistic children can be used to purchase a phone or tablet if a registered therapist signs a document declaring that it has a therapeutic benefit.

I think that Android devices are generally better for educational purposes than iOS devices because Android is a less restrictive platform. On an Android device you can install apps downloaded from a web site or from a 3rd party app download service. Even if you stick to the Google Play store there’s a wider range of apps to choose from because Google is apparently less restrictive.

Android devices usually allow installation of a replacement OS. The Nexus devices are always unlocked and have a wide range of alternate OS images and the other commonly used devices can usually have an alternate OS installed. This allows kids who have the interest and technical skill to extensively customise their device and learn all about it’s operation. iOS devices are designed to be sealed against the user. Admittedly there probably aren’t many kids with the skill and desire to replace the OS on their phone, but I think it’s good to have option.

Android phones have a range of sizes and features while Apple only makes a few devices at any time and there’s usually only a couple of different phones on sale. iPhones are also a lot smaller than most Android phones, according to my previous estimates of hand size the iPhone 5 would be a good tablet for a 3yo or good for side-grasp phone use for a 10yo [5]. The main benefits of a phone are for things other than making phone calls so generally the biggest phone that will fit in a pocket is the best choice. The tiny iPhones don’t seem very suitable.

Also buying one of each is a viable option.

Conclusion

I think that mobile phone ownership is good for almost all kids even from a very young age (there are many reports of kids learning to use phones and tablets before they learn to read). There are no real down-sides that I can find.

I think that Android devices are generally a better option than iOS devices. But in the case of special needs kids there may be advantages to iOS.

June 22, 2015

App Container and the Open Container Project

Today we’re pleased to announce that CoreOS, Docker, and a large group of industry leaders are working together on a standard container format through the formation of the Open Container Project (OCP). OCP is housed under the Linux Foundation, and is chartered to establish common standards for software containers. This announcement means we are starting to see the concepts behind the App Container spec and Docker converge. This is a win for both users of containers and our industry at large.

In December 2014 we announced rkt, a new container runtime intended to address issues around security and composability in the container ecosystem. At the same time, we started App Container (appc), a specification defining a container image format, runtime environment and discovery protocol, to work towards the goal of a standard, portable shipping container for applications. We believe strongly that open standards are key to the success of the container ecosystem.

We created App Container to kickstart a movement toward a shared industry standard. With the announcement of the Open Container Project, Docker is showing the world that they are similarly committed to open standards. Today Docker is the de facto image format for containers, and therefore is a good place to start from in working towards a standard. We look forward to working with Docker, Google, Red Hat and many others in this effort to bring together the best ideas across the industry.

As we participate in OCP, our primary goals are as follows:

  • Users should be able to package their application once and have it work with any container runtime (like Docker, rkt, Kurma, or Jetpack)
  • The standard should fulfill the requirements of the most rigorous security and production environments
  • The standard should be vendor neutral and developed in the open

App Container

We believe most of the core concepts from App Container will form an important part of OCP. Our experience developing App Container will play a critical role as we begin collaboration on the OCP specification. We anticipate that much of App Container will be directly integrated into the OCP specification, with tweaks being made to provide greater compatibility with the existing Docker ecosystem. The end goal is to converge on a single unified specification of a standard container format, and the success of OCP will mean the major goals of App Container are satisfied. Existing appc maintainers Brandon Philips and Vincent Batts will be two of the initial maintainers of OCP and will work to harmonize the needs of both communities in the spirit of a combined standard. At the same time we will work hard to ensure that users of appc will have a smooth migration to the new standard.

Continuing work on rkt

CoreOS remains committed to the rkt project and will continue to invest in its development. Today rkt is a leading implementation of appc, and we plan on it becoming a leading implementation of OCP. Open standards only work if there are multiple implementations of the specification, and we will develop rkt into a leading container runtime around the new shared container format. Our goals for rkt are unchanged: a focus on security and composability for the most demanding production environments.

We are excited the industry is converging a format that combines the best ideas from appc, rkt and Docker to achieve what we all need to succeed: a well-defined shared standard for containers.

For more information and to see the draft charter and founding formation of the OCP, go to www.opencontainers.org.

June 20, 2015

BTRFS Status June 2015

The version of btrfs-tools in Debian/Jessie is incapable of creating a filesystem that can be mounted by the kernel in Debian/Wheezy. If you want to use a BTRFS filesystem on Jessie and Wheezy (which isn’t uncommon with removable devices) the only options are to use the Wheezy version of mkfs.btrfs or to use a Jessie kernel on Wheezy. I recently got bitten by this issue when I created a BTRFS filesystem on a removable device with a lot of important data (which is why I wanted metadata duplication and checksums) and had to read it on a server running Wheezy. Fortunately KVM in Wheezy works really well so I created a virtual machine to read the disk. Setting up a new KVM isn’t that difficult, but it’s not something I want to do while a client is anxiously waiting for their data.

BTRFS has been working well for me apart from the Jessie/Wheezy compatability issue (which was an annoyance but didn’t stop me doing what I wanted). I haven’t written a BTRFS status report for a while because everything has been OK and there has been nothing exciting to report.

I regularly get errors from the cron jobs that run a balance supposedly running out of free space. I have the cron jobs due to past problems with BTRFS running out of metadata space. In spite of the jobs often failing the systems keep working so I’m not too worried at the moment. I think this is a bug, but there are many more important bugs.

Linux kernel version 3.19 was the first version to have working support for RAID-5 recovery. This means version 3.19 was the first version to have usable RAID-5 (I think there is no point even having RAID-5 without recovery). It wouldn’t be prudent to trust your important data to a new feature in a filesystem. So at this stage if I needed a very large scratch space then BTRFS RAID-5 might be a viable option but for anything else I wouldn’t use it. BTRFS still has had little performance optimisation, while this doesn’t matter much for SSD and for single-disk filesystems for a RAID-5 of hard drives that would probably hurt a lot. Maybe BTRFS RAID-5 would be good for a scratch array of SSDs. The reports of problems with RAID-5 don’t surprise me at all.

I have a BTRFS RAID-1 filesystem on 2*4TB disks which is giving poor performance on metadata, simple operations like “ls -l” on a directory with ~200 subdirectories takes many seconds to run. I suspect that part of the problem is due to the filesystem being written by cron jobs with files accumulating over more than a year. The “btrfs filesystem” command (see btrfs-filesystem(8)) allows defragmenting files and directory trees, but unfortunately it doesn’t support recursively defragmenting directories but not files. I really wish there was a way to get BTRFS to put all metadata on SSD and all data on hard drives. Sander suggested the following command to defragment directories on the BTRFS mailing list:

find / -xdev -type d -execdir btrfs filesystem defrag -c {} +

Below is the output of “zfs list -t snapshot” on a server I run, it’s often handy to know how much space is used by snapshots, but unfortunately BTRFS has no support for this.

NAME USED AVAIL REFER MOUNTPOINT
hetz0/be0-mail@2015-03-10 2.88G 387G
hetz0/be0-mail@2015-03-11 1.12G 388G
hetz0/be0-mail@2015-03-12 1.11G 388G
hetz0/be0-mail@2015-03-13 1.19G 388G

Hugo pointed out on the BTRFS mailing list that the following command will give the amount of space used for snapshots. $SNAPSHOT is the name of a snapshot and $LASTGEN is the generation number of the previous snapshot you want to compare with.

btrfs subvolume find-new $SNAPSHOT $LASTGEN | awk '{total = total + $7}END{print total}'

One upside of the BTRFS implementation in this regard is that the above btrfs command without being piped through awk shows you the names of files that are being written and the amounts of data written to them. Through casually examining this output I discovered that the most written files in my home directory were under the “.cache” directory (which wasn’t exactly a surprise).

Now I am configuring workstations with a separate subvolume for ~/.cache for the main user. This means that ~/.cache changes don’t get stored in the hourly snapshots and less disk space is used for snapshots.

Conclusion

My observation is that things are going quite well with BTRFS. It’s more than 6 months since I had a noteworthy problem which is pretty good for a filesystem that’s still under active development. But there are still many systems I run which could benefit from the data integrity features of ZFS and BTRFS that don’t have the resources to run ZFS and need more reliability than I can expect from an unattended BTRFS system.

At this time the only servers I run with BTRFS are located within a reasonable drive from my home (not the servers in Germany and the US) and are easily accessible (not the embedded systems). ZFS is working well for some of the servers in Germany. Eventually I’ll probably run ZFS on all the hosted servers in Germany and the US, I expect that will happen before I’m comfortable running BTRFS on such systems. For the embedded systems I will just take the risk of data loss/corruption for the next few years.

June 18, 2015

Philippines is ready, set, go with CAP on a Map

The Philippines Atmospheric, Geophysical, and Astronomical Services Administration (PAGASA), Philippines Institute of Volcanology and Seismology (PHILVOLCS), and the National Disaster Risk Reduction and Management Council (NDRRMC) are three agencies of foremost importance. Combined they are responsible for the monitoring, detecting, [Read the Rest...]

June 15, 2015

SahanaCamp Turkey

Turkey recently hosted the latest SahanaCamp, that magical blend of humanitarians and techie folks coming together to work on solving information management problems. Elvan Cantekin, General Manager at the MAG Foundation has been working on this for a couple of years and [Read the Rest...]

June 11, 2015

Technology Preview: CoreOS Linux and xhyve

Yesterday a new lightweight hypervisor for OS X was released called xhyve; if you are familiar with qemu-kvm on Linux, it provides a roughly similar experience. In this post we are going to show how to run CoreOS Linux under xhyve. While this is all very early and potentially buggy tech, we want to give you some tips on how to try CoreOS Linux with xhyve and run Docker or rkt on top.

xyhve is a port of bhyve, the FreeBSD hypervisor, to OS X. It is designed to run off-the-shelf Linux distros. We’ve made it possible to run it on CoreOS Linux so you can get the benefits of a lightweight Linux OS running under a lightweight hypervisor on Macs. It is now possible to launch a full local development or testing environment with just a few shell commands.

A few ideas we are thinking about:

  • Single command to launch CoreOS Linux images.
  • Easily launch a Kubernetes cluster right on your laptop.
  • An OS X native version of rkt that can run Linux applications inside xhyve.

Keep in mind that xhyve is a very new project, so much work still needs to be done. You must be running OS X Yosemite for this to work. Check out this page for step-by-step instructions on how to try it out.

A Quick Example

Currently, you need to build xhyve yourself:

$ git clone https://github.com/mist64/xhyve
$ cd xhyve
$ make
$ sudo cp build/xhyve /usr/local/bin/

Now we can install the initial CoreOS tooling for xhyve:

$ git clone https://github.com/coreos/coreos-xhyve.git
$ cd coreos-xhyve
$ ./coreos-xhyve-fetch
$ sudo ./coreos-xhyve-run

Type ip a in the console of the virtual machine to get its IP address.

Let’s run a simple Docker container:

$ docker -H<ip-of-virtual-machine>:2375 run -it --rm busybox

Please open issues with ideas for enhancements or use cases. We welcome contributions to the code, so please open a pull request if you have code to share.

June 09, 2015

etcd2 in the CoreOS Linux Stable channel

This week marks a particularly special milestone for etcd2. Beginning today, etcd2 will be available in the CoreOS Linux Stable channel. This means that everyone will now be able to take advantage of etcd2, which we launched earlier this year.

etcd is an open source, distributed, consistent key-value store. It is a core component of CoreOS software that helps to facilitate safe automatic updates, coordinate work between hosts, and manage overlay networking for containers. To recap, new features and improvements in etcd2 include:

  • Reconfiguration protocol improvements, enabling more safeguards against accidental misconfiguration
  • A new raft implementation, providing improved cluster stability and predictability in massive server environments
  • On-disk safety improvements, in which CRC checksums and append-only log behavior allow etcd to detect external data corruption and avoid internal file misoperations

More details can be found in this post which first introduced etcd2. Give it a shot and let us know what you think!

A special thank-you to all of the contributors who made this possible. Join us in the continued development of etcd through the etcd-dev discussion mailing list, GitHub issues, or contributing directly to the project.

June 03, 2015

Building and deploying minimal containers on Kubernetes with Quay.io and wercker

Today's guest post has been written by Micha "mies" Hernandez van Leuffen, the founder and CEO of wercker, a platform and tool for building, testing and deploying in the modern world of microservices, containers and clouds.

Edit: Added video to end of post. Skip to video

The landscape of production has changed: monolithic is out, loosely coupled microservices are in. Modern applications consist of multiple moving parts, but most of the existing developer tooling we use was designed and built in the world of monolithic applications.

Working with microservices poses new challenges: your applications now consist of multiple processes, multiple configurations, multiple environments and more than one codebase.

Containers offer a way to isolate and package your applications along with their dependencies. Docker and rkt are popular container runtimes and allow for a simplified deployment model for your microservices. Wercker is a platform and command line tool built on Docker that enables developers to develop, test, build and deliver their applications in a containerized world. Each build artifact from a pipeline is a container, which gives you an immutable testable object linked to a commit.

In this tutorial, we will build and launch a containerized application on top of Kubernetes. Kubernetes is a cluster orchestration framework started by Google, specifically aimed at running container workloads. We will use quay.io from CoreOS for our container registry and wercker (of course!) to build the container and trigger deploys to Kubernetes.

The workflow we will create is depicted below:

werker pipeline

Workflow from build to deployment.

Requirements

This tutorial assumes you have the following set up:

  • A wercker account. You can sign up for free here.
  • An account on quay.io.
  • A Kubernetes cluster. See the getting started section to set one up.
  • A fork of the application we will be building which you can find on GitHub.
  • You've added the above application to wercker and are using the Docker stack to build it.

Getting started

The application we will be developing is a simple API with one endpoint, which returns an array of cities in JSON. You can check out the source code for the API on GitHub. The web process listens on port 5000; we'll need this information later on.

Now, let's create our Kubernetes service configuration and include it into our repository.

{
   "kind": "Service",
   "apiVersion": "v1beta3",
   "metadata": {
      "name": "cities",
      "labels": {
         "name": "cities"
      }
   },
   "spec":{
      "createExternalLoadBalancer": true,
      "ports": [
         {
           "port": 5000,
           "targetPort": "http-server",
           "protocol": "TCP"
         }
      ],
      "selector":{
         "name":"cities"
      }
   }
}

We define the port that our application is listening on and use the public IP addresses that we got upon creating our Kubernetes cluster. We're using Google Container Engine, which allows for createExternalLoadBalancer. If you're using a platform which doesn't support createExternalLoadBalancer then you need to add the public IP addresses of the nodes to the publicIPs property.

Next, we're going to define our pipeline, which describes how wercker will build and deploy your application.

wercker.yml - build pipeline

On wercker, you structure your pipelines in a file called wercker.yml. It’s where you define the actions (steps) and environment for your tasks (tests, builds, deploys). Pipelines can either pass or fail, depending on the results of the steps within. Steps come in three varieties; steps from the wercker step registry, inline script steps and internal steps that run with extra privileges.

Pipelines also come with environment variables, some of which are set by default, others you can define yourself. Each pipeline can have its own base container (the main language environment of your application) and any number of services (databases, queues).

Now, let's have a look at our build pipeline for the application. You can check out the entire wercker.yml on GitHub.

build:
    box: google/golang
    steps:

    # Test the project
    - script:
        name: go test
        code: go test ./...

    # Statically build the project
    - script:
        name: go build
        code: CGO_ENABLED=0 go build -a -ldflags '-s' -installsuffix cgo -o app .

    # Create cities-controller.json only for initialization
    - script:
        name: create cities-controller.json
        code: ./create_cities-controller.json.sh

    # Copy binary to a location that gets passed along to the deploy pipeline
    - script:
        name: copy binary
        code: cp app cities-service.json cities-controller.json "$WERCKER_OUTPUT_DIR"

The box is the container and environment in which the build runs. Here we see that we're using the google/golang image as a base container for our build as it has the golang language and build tools installed in it. We also have a small unit test inside of our code base which we run first. Next we compile our code and build the app executable.

As we want to build a minimal container, we will statically compile our application. We disable the ability to create Go packages that call C code with the CGO_ENABLED=0 flag, rebuild all dependencies with the -a flag, and finally we remove any debug information with the -ldflags flag, resulting in an even smaller binary.

Next, we create our Kubernetes replication controller programmatically based on the git commit using a shell script. You can check out the shell script on GitHub.

The last step copies the executable and Kubernetes service definitions into the $WERCKER_OUTPUT_DIR folder, and the contents of this folder gets passed along to the /pipeline/source/ folder within the deploy pipeline.

wercker.yml - push to quay.io

We're now ready to set up our deploy pipelines and targets. We will create two deploy targets. The first will push our container to Quay.io, the second will perform the rolling update to Kubernetes. Deploy targets are created in the wercker web interface and reference the corresponding section in the wercker.yml.

werker pipeline

Deploy targets in werker.

In order to add any information such as usernames, passwords, or tokens that our deploy target might need, we define these as environment variables for each target. These environment variables will be injected when a pipeline is executed.

Quay.io is a public and private registry for Docker image repositories. We will be using Quay.io to host the container image that is built from wercker.

deploy:
    box: google/golang
    steps:
     # Use the scratch step to build a container from scratch based on the files present
    - internal/docker-scratch-push:
        username: $QUAY_USERNAME
        password: $QUAY_PASSWORD
        cmd: ./app
        tag: $WERCKER_GIT_COMMIT
        ports: "5000"
        repository: quay.io/wercker/wercker-kubernetes-quay
        registry: https://quay.io

The deploy section of our wercker.yml above consists of a single step. We use the internal/docker-scratch-push step to create a minimal container based on the files present in the $WERCKER_ROOT environment variable (which contains our binary and source code) from the build, and push it to Quay.io. The $QUAY_USERNAME and $QUAY_PASSWORD parameters are environment variables that we have entered on the wercker web interface. For the tag, we use the git commit hash, so each container is versioned. This hash is available as an environment variable from within the wercker pipeline.

The cmd parameter is the command that we want to run on start-up of the container, which in our case is our application that we've built. We also need to define the port on which our application will be available, which should be the same port as in our Kubernetes service definition. Finally, we fill in the details of our Quay.io repository and the URL of the registry.

If you take a look at your Quay.io dashboard you will see that the final container that was pushed is just 1.2MB!

wercker.yml - Kubernetes rolling update

For this tutorial, we assume you've already created a service with an accompanying replication controller. If not, you can do this via wercker as well. See the initialize section in the wercker.yml

Let's proceed to do the rolling update on Kubernetes, replacing our pods one-by-one.

   rolling-update:
    - kubectl:
        server: $KUBERNETES_MASTER
        username: $KUBERNETES_USERNAME
        password: $KUBERNETES_PASSWORD
        insecure-skip-tls-verify: true
        command: rolling-update cities
        image: quay.io/wercker/wercker-kubernetes-quay:$WERCKER_GIT_COMMIT

The environment variables are again defined in the wercker web interface. The $KUBERNETES_MASTER environment variable contains the IP address of our instance.

werker pipeline

Kubernetes credentials defined in the pipeline.

We execute the rolling update command and tell Kubernetes to use our Docker container from Quay.io with the image parameter. The tag we use for the container is the git commit hash.

Conclusion

In this tutorial, we have showcased how to build minimal containers and use wercker as our assembly line. Our final container was just 1.2MB, making for low-cost deploys!

Though the go programming language compiles to single binaries, making our life easier, our learnings can be applied to other programming languages as well.

Using wercker's automated build process we've not only created a minimal container, but also linked our artifact versioning to git commits in Quay.io.

Pairing our versioned containers with Kubernetes' orchestration capabilities results in a radically simplified deployment process, especially with the power of rolling updates!

In short, the combination of Kubernetes, Quay.io and wercker is a powerful and disruptive way of building and deploying modern-day applications.

In this article we've just scratched the surface of developing container-based microservices. To learn more about Kubernetes check out the getting started guides. For more information on Quay.io, see the documentation site. You can sign up for wercker here and more information and documentation is available at our dev center. The source code for our final application including its wercker.yml is available on GitHub.

June 02, 2015

Oh, the places we’ll be in June

We’re across the US and in the Netherlands this month. Check out where we’re speaking!


Couchbase Connect: Thursday, June 4 at 1:45 p.m. PDT – Santa Clara, CA

Brian Harrington (@brianredbeard), also known as Redbeard, principal architect at CoreOS, will be at Couchbase Connect and will join Traun Leyden from Couchbase to discuss Tectonic, provide a deep dive on the technology behind Kubernetes, and walk through the steps required to get Couchbase running on Kubernetes.


HP Discover: Thursday, June 4 at 3:30 p.m. PDT – Las Vegas, NV

At HP Discover in Las Vegas this week? Brandon Philips (@brandonphilips), CTO of CoreOS, Janne Heino of Nokia and Chris Grzegorczyk (@grze), chief architect at HP, will speak on Thursday, June 4 at 3:30 p.m. at Discover Theater 1 about Hybrid cloud and containers for modern application architectures. Join Nokia to walk through its global private cloud deployment of Helion Eucalyptus that also uses CoreOS’s container runtime, rkt.


ContainerDays Boston: Friday, June 5 at 3:40 p.m. EDT – Boston, MA

Barak Michener (@barakmich), software engineer and CoreOS developer advocate, will be at ContainerDays Boston and will discuss CoreOS: Building the Layers of the Cluster. Barak will also join Dave Nielsen (@davenielsen) from CloudCamp on Saturday at 12:55 p.m. EDT for a workshop that will help you get started with deploying your first container to CoreOS, Cloud Foundry, Azure and AWS.


QCon: Monday, June 8 at 9 a.m. EDT – New York, NY

Join Kelsey Hightower (@kelseyhightower), product manager, developer and chief advocate at CoreOS, for an in-depth, day-long tutorial at QCon New York on Kubernetes and CoreOS.


Cloud Expo East: Tuesday, June 9 at 1:55 p.m. EDT – New York, NY

Meet Jake Moshenko, product technical lead at CoreOS, who will speak at Cloud Expo East about Containers: New Ways to Deploy and Manage Applications at Scale.


Nutanix .NEXT: Tuesday, June 9-Wednesday, June 10 – Miami, FL

Kelsey Hightower (@kelseyhightower) and Alex Polvi (@polvi), CEO of CoreOS, will present at Nutanix .NEXT, the company’s first user conference. See Kelsey speak on Tuesday, June 9 at 3:30 p.m. EDT on Containers—What They Mean for the Future of Application Deployment. Alex will join Alan Cohen, chief commercial officer at Illumio, Dheeraj Pandey (@trailsfootmarks), CEO of Nutanix, and JR Rivers (@JRCumulus), CEO of Cumulus Networks, in the closing keynote panel: The New Enterprise IT Stack. Don’t miss it on Wednesday, June 10 at 12:15 p.m. EDT.


GoSV Meetup: Tuesday, June 9 at 6:30 p.m. PDT – San Mateo, CA

The CoreOS team will be talking with the Go Silicon Valley Meetup group this month in San Mateo at Collective Health. Register here.


NYLUG Meetup: Wednesday, June 17 at 6:30 p.m. EDT – New York, NY

The CoreOS New York team will be at the New York Linux Users Group (NYLUG) and will provide an overview of CoreOS. Sign-ups begin on June 3. Register to attend here.


GoSF Meetup: Wednesday, June 17 at 6:30 p.m. PDT – San Francisco, CA

See the CoreOS team at the GoSF Meetup to listen in on a talk about A Survey of RPC options in Go.


GOTO Amsterdam: Friday, June 19 – Amsterdam, The Netherlands

Kelsey Hightower (@kelseyhightower) will be at GOTO Amsterdam speaking on rkt and the App Container spec at 11:30 a.m. CEST and will join a panel at 3:50 p.m. CEST to discuss Docker predictions.


Pre-DockerCon panel: Sunday, June 21 – San Francisco, CA

Join Kelsey Hightower (@kelseyhightower) and other thought leaders that will be at DockerCon for a pre-event evening panel on conducting systems and services: an evening about orchestration.


DevOpsDays Amsterdam: Wednesday, June 24 – Amsterdam, The Netherlands

Learn about CoreOS at a DevOpsDays Amsterdam workshop presented by Chris Kühl (@blixtra) on June 24.


In case you missed it, check out the recordings of the CoreOS Fest talks that were held last month. More will be posted this month so stay tuned.

May 19, 2015

Dagstuhl Seminar: Compositional Verification Methods for Next-Generation Concurrency

Some time ago, I figured out that there are more than a billion instances of the Linux kernel in use, and this in turn led to the realization that a million-year RCU bug is happening about three times a day across the installed base. This realization has caused me to focus more heavily on RCU validation, which has uncovered a number of interesting bugs. I have also dabbled a bit in formal verification, which has not yet found a bug. However, formal verification might be getting there, and might some day be a useful addition to RCU's regression testing. I was therefore quite happy to be invited to this Dagstuhl Seminar. In what follows, I summarize a few of the presentation. See here for the rest of the presentations.



Viktor Vafeiadis presented his analysis of the C11 memory model, including some “interesting” consequences of data races, where a data race is defined as a situation involving multiple concurrent accesses to a non-atomic variable, at least one of which is a write. One such consequence involves a theoretically desirable “strengthening” property. For example, this property would mean that multiplexing two threads onto a single underlying thread would not introduce new behaviors. However, with C11, the undefined-behavior consequences of data races can actually cause new behaviors to appear with fewer threads, for example, see Slide 7. This suggests the option of doing away with the undefined behavior, which is exactly the option that LLVM has taken. However, this approach requires some care, as can be seen on Slide 19. Nevertheless, this approach seems promising. One important takeaway from this talk is that if you are worried about weak ordering, you need to pay careful attention to reining in the compiler's optimizations. If you are unconvinced, take a look at this! Jean Pichon-Pharabod, Kyndylan Nienhuis, and Mike Dodds presented on other aspects of the C11 memory model.



Martin T. Vechev apparently felt that the C11 memory model was too tame, and therefore focused on event-driven applications, specifically javascript running on Android. This presentation included some entertaining concurrency bugs and their effects on the browser's display. Martin also discussed formalizing javascript's memory model.



Hongjin Liang showed that ticket locks can provide starvation freedom given a minimally fair scheduler. This provides a proof point for Björn B. Brandenburg's dissertation, which analyzed the larger question of real-time response from lock-based code. It should also provide a helpful corrective to people who still believe that non-blocking synchronization is required.



Joseph Tassarotti presented a formal proof of the quiescent-state based reclamation (QSBR) variant of userspace RCU. In contrast to previous proofs, this proof did not rely on sequential consistency, but instead leveraged a release-acquire memory model. It is of course good to see researchers focusing their tools on RCU! That said, when a researcher asked me privately whether I felt that the proof incorporated realistic assumptions, I of course could not resist saying that since they didn't find any bugs, the assumptions clearly must have been unrealistic.



My first presentation covered what would be needed for me to be able to use formal verification as part of Linux-kernel RCU's regression testing. As shown on slide 34, these are:





  1. Either automatic translation or no translation required. After all, if I attempt to manually translate Linux-kernel RCU to some special-purpose language every release, human error will make its presence known.

  2. Correctly handle environment, including the memory model, which in turn includes compiler optimizations.

  3. Reasonable CPU and memory overhead. If these overheads are excessive, RCU is better served by simple stress testing.

  4. Map to source code lines containing the bug. After all, I already know that there are bugs—I need to know where they are.

  5. Modest input outside of source code under test. The sad fact is that a full specification of RCU would be at least as large as the implementation, and also at least as buggy.

  6. Find relevant bugs. To see why this is important, imagine that some tool finds 100 different million-year bugs and I fix them all. Because roughly one of six fixes introduces a bug, and because that bug is likely to reproduce in far less than a million years, this process has likely greatly reduced the robustness of the Linux kernel.





I was not surprised to get some “frank and honest” feedback, but I was quite surprised (but not at all displeased) to learn that some of the feedback was of the form “we want to see more C code.” After some discussion, I provided just that.

CoreOS Linux is in the OpenStack App Marketplace

Today at the OpenStack Summit in Vancouver, we are pleased to announce that CoreOS Linux – the lightweight operating system that provides stable, reliable updates to all machines connected to the update service – is included in the OpenStack Community App Catalog.

CoreOS Linux is now available in the Community App Catalog alongside ActiveState Stackato, Apcera, Cloud Foundry, Kubernetes, MySQL, Oracle Database 12c and Oracle Multitenant, Postgres, Project Atomic, Rally, Redis, Tomcat and Wordpress. The Community App Catalog is where community members can share apps and tools designed to integrate with OpenStack Clouds.

With the ability to use CoreOS directly from the catalog, it will be easier to use CoreOS Linux on OpenStack. CoreOS Linux delivers automatic updates that are critical to keeping a system secure. CoreOS Linux’s continuous stream of updates minimizes the complexity of an update and engineering teams have the flexibility to select specific release channels to deploy and control how clusters apply updates. Get started with CoreOS on OpenStack here.

At the intersection of open source technologies, we are excited to continue helping users succeed with containers in the OpenStack ecosystem. If you are at the OpenStack Summit this week, stop by to meet us and see our talk today at 2 p.m., Dream Stack, CoreOS + OpenStack + Kubernetes.

May 18, 2015

CoreOS at OpenStack Summit 2015

CoreOS is in Vancouver this week. Not only are we excited to see where OpenStack is taking containers; we’re also pumped for 24-hour poutine.

There are CoreOS-focused events on the first three days of the conference! Speakers on Monday and Tuesday, plus a deep dive into CoreOS all afternoon on Wednesday.


Monday, May 18, 2015 at 2:00 p.m.

Don’t miss our very own Matthew Garrett talking about how we can secure cloud infrastructure using TPMs today at 2 p.m.


Tuesday, May 19, 2015 at 2:00 p.m.

Next up on Tuesday, May 19 at 2 p.m., we have Brian “Redbeard” Harrington from CoreOS telling us all about the Dream Stack, CoreOS + OpenStack + Kubernetes.


Wednesday, May 20, 2015 at 1:50-6:00 p.m.

To dive deeper into CoreOS, our Collaboration Day event is on Wednesday at 1:50-6 p.m. Brian Harrington and Brian Waldon will be there to answer all of your CoreOS questions. Here is the schedule:

Time Topic
1:50 - 2:30 CoreOS as a building block for OpenStack Ironic
2:40 - 3:20 Managing CoreOS Images effectively with Glance (Dos and Don'ts)
3:30 - 4:10 CoreOS Developer AMA (Ask Me Anything)
4:10 - 4:30 20 minute break
4:30 - 5:10 Administrative/Firmware containers - going beyond your web applications
5:20 - 6:00 Building minimal application containers from scratch

Be sure to stop by our 3D-printing booth right near registration! That’s right. We’ve teamed up with Codame to immortalize your time here at OpenStack Summit. Try it alone, or bring a friend. You’re not going to want to miss this!

Meet our team and tweet to us @CoreOSLinux!

CoreOS at OpenStack Summit

May 14, 2015

New Functional Testing in etcd

Today we are discussing the new fault-injecting, functional testing framework built to test etcd, which can deploy a cluster, inject failures, and check the cluster for correctness continuously.

For context, etcd is an open source, distributed, consistent key-value store. It is a core component of CoreOS software that facilitates safe automatic updates, coordinates work scheduled to hosts, and sets up overlay networking for containers. Because of its core position in the stack, its correctness and availability is significantly critical, which is why the etcd team has built the functional testing framework.

Since writing the framework, we have run it continuously for the last two months, and etcd has shown to be robust under many kinds of harsh failure scenarios. This framework has also helped us identify a few potential bugs and improvements that we’ve fixed in newer releases — read on for more info.

Functional Testing

etcd’s functional test suite tests the functionality of an etcd cluster with a focus on failure-resistance under heavy usage.

The main workflow of the functional test suite is straightforward:

  1. It sets up a new etcd cluster and injects a failure into the cluster. A failure is some unexpected situation that may happen in the cluster, e.g., machine doesn’t work or network is down.
  2. It repairs the failure and expects the etcd cluster to recover within a short amount of time (usually one minute).
  3. It waits for the etcd cluster to be fully consistent and making progress.
  4. It starts the next round of failure injection.

Meanwhile, the framework makes continuous write requests to the etcd cluster to simulate heavy workloads. As a result, there are constantly hundreds of write requests queued, intentionally causing a heavy burden on the etcd cluster.

If the running cluster cannot recover from failure, the functional testing framework archives the cluster state and does the next round of testing on a new etcd cluster. When archiving, process logs and data directories for each etcd member are saved into a separate directory, which can be viewed and debugged in the future.

Basic Architecture

etcd's functional test suite has two components: etcd-agent and etcd-tester. etcd-agent runs on every etcd node and etcd-tester is a single controller of the test.

etcd-agent is a daemon on each machine. It can start, stop, restart, isolate and terminate an etcd process. The agent exposes these functionalities via RPC.

etcd-tester utilizes all etcd-agents to control the cluster and simulate various test cases. For example, it starts a three-member cluster by sending three start-RPC calls to three different etcd-agents. It then forces one of the members to fail by sending a stop-RPC call to the member’s etcd-agent.

etcd functional testing

While etcd-tester uses etcd-agent to control etcd externally, it also directly connects to etcd members to make simulated HTTP requests, including setting a range of keys and checking member health.

Internal Testing Suite

The internal functional testing suite case is built upon four n1-highcpu-2 virtual machines on Google Compute Engine. Each machine has 2 virtual cores, 1.8G memory and 200G standard persistent disk. Three machines have etcd-agent running as a daemon, while the fourth machine runs etcd-tester as the controller.

Currently we have six major failures to simulate the most common cases that etcd may meet in real life:

  1. kill all members
    • the whole data center experiences an outage, and the etcd cluster in the data center is killed
  2. kill the majority of the cluster
    • part of the data center experiences an outage, and the etcd cluster loses quorum
  3. kill one member
    • a single machine needs to be upgraded or maintained
  4. kill one member for a significant time and expect it to recover from an incoming snapshot
    • a single machine is down due to hardware failure, and requires manual repair
  5. isolate one member
    • the network interface on a single machine is broken
  6. isolate all members
    • the router or switch in the data center is broken

Meanwhile, 250k 100-byte keys are written into the etcd cluster continuously, which means we’re storing about 25MB of data in the cluster.

Discovering Potential Bugs

This test suite has helped us to discover potential bugs and areas to improve. In one discovery, we found that when a leader is helping the follower catch up with the progress of the cluster, there was a slight possibility that memory and CPU usage could explode without bound. After digging into the log, it turned out that the leader was repeatedly sending 50MB-size snapshot messages and overloaded its transport module. To fix the issue, we designed a message flow control for snapshot messages that solved the resource explosion.

Another example is the automatic WAL repair feature added in 2.1.0. To protect data integrity, etcd intentionally refuses to restart if the last entry in the underlying WAL was half-written, which may happen if the process is killed or disk is full. We've found this happens occasionally (once per hundred rounds) in functional testing, and it’s safe and easier to remove the error automatically and recover from the cluster to simplify the recovery for the administrator. This functionality has been merged into the master branch, and will be released in v2.1.0.

After several weeks of running and debugging, the etcd cluster has survived several thousand consecutive rounds of all six failures. Surviving serious testing, the etcd cluster is strong and working quite well.

Diving into the Code

Build and Run

etcd-agent can be built via

$ go build github.com/coreos/etcd/tools/functional-tester/etcd-agent

and etcd-tester at

$ go build github.com/coreos/etcd/tools/functional-tester/etcd-tester

Run etcd-agent binary on machine{1,2,3}:

$ ./etcd-agent --etcd-path=$ETCD_BIN_PATH

Run etcd-tester binary on machine4:

$ ./etcd-tester -agent-endpoints=”$MACHINE1_IP:9027,$MACHINE2_IP:9027,$MACHINE3_IP:9027” -limit=3 -stress-key-count=250000 -stress-key-size=100

etcd-tester starts running, and makes 3 rounds of all failures on a 3-member cluster in machines 1, 2, and 3.

Add a new failure

Let us go through the process to add failureKillOne, which kills one member and recovers it afterwards. First, write how to inject and recover from failure:

type failureKillOne struct {
  description
}

func newFailureKillOne() *failureKillOne {
  return &failureKillOne{
    // detailed description of the failure
    description: "kill one member",
  }
}

func (f *failureKillOne) Inject(c *cluster, round int) error {
  // round robin on all members
  i := round % c.Size
  // ask its agent to stop etcd
  return c.Agents[i].Stop()
}

func (f *failureKillOne) Recover(c *cluster, round int) error {
  i := round % c.Size
  // ask its agent to restart etcd
  if _, err := c.Agents[i].Restart(); err != nil {
    return err
  }
  // wait for recovery done
  return c.WaitHealth()
}

Then we add it into failure lists:

  t := &tester{
    failures: []failure{
      newFailureKillOne(),
    },
    cluster: c,
    limit:   *limit,
  }

Done.

As you see, the framework is simple but already fairly powerful. We are looking forward to having you join the etcd test party!

Future Plans

The framework is still under active development, and more failure cases and checks will be added.

Random network partitions, network delays and runtime reconfigurations are some classic failure cases that the framework does not yet cover. Another interesting idea we plan to explore is a cascading failure case that injects multiple failure cases at the same time.

On the recovery side, more checks against consistent views of the keyspace on all members is a good starting point for more exploration.

The internal testing cluster runs 24/7, and our etcd cluster works perfectly under the current failure set. The etcd team is making its best effort to guarantee etcd’s correctness, and hopes that we can provide users the most robust consensus store possible.

Follow-up plans for more specific and harsher tests are in our TODO list. This framework is good to imitate real-life scenarios, but it cannot have fine controls on lower-level system and hardware behaviors. Future testing approaches may use simulated networks and disks to tackle these failure simulations.

We will keep enhancing the testing strength and coverage by adding more failure cases and checks into the framework. Pull requests to the framework are welcomed!

Acknowledgement

We are running our testing cluster on GCE. Thanks to Google for providing the testing environment.

May 13, 2015

Upcoming CoreOS Events in May

We kicked off May by hosting our first ever CoreOS Fest, and it was a blast! We’re sad to see it go, but we’re excited about all of the other events we’ll be speaking at and attending this month.


Wednesday, May 13, 2015 at 2:00 p.m. EDT

What could be better than listening to Kelsey Hightower give a talk! Listen in from anywhere in the world to hear Kelsey discuss how to get started with containers and microservices during the Logentries Webinar. Register now!


Wednesday, May 13, 2015 at 6:00 p.m. PDT - San Francisco, CA

Alex Crawford from CoreOS will be giving an overview of CoreOS at the SF DevOps Meetup group. Thanks to Teespring for hosting the event at its SOMA office. Be sure not to miss it!


Tuesday, May 19, 2015 at 2:00 p.m. PDT - Vancouver, BC Canada

If you find yourself at OpenStack Summit Vancouver, be sure to check out Brian ‘Redbeard’ Harrington talk about modern practices for building a private cloud that runs containers at scale. We’ll also have our team there, so please stop by our area and meet us. We even have a Collaboration Day session for attendees on Wednesday, May 20 from 1:50 p.m. to 6 p.m.


Wednesday, May 20, 2015 at 9:10 a.m. EDT - Seven Springs, PA

CoreOS CEO Alex Polvi will be keynoting WHD.usa this year by talking about building distributed systems and securing the backend of the internet.


Wednesday, May 20, 2015 at 6:30 p.m. EDT - Atlanta, GA

Join Brian Akins from CoreOS in Atlanta at the DevOps ATL Meetup, where he’ll be discussing new ways to deploy and manage applications at scale. Thanks to MailChimp for hosting this meetup at their Ponce City Market office.


Wednesday, May 20, 2015 at 11:05 a.m. MDT - Denver, CO

Don’t miss Kelsey Hightower at GlueCon 2015 where he’ll give an overview of key technologies at CoreOS and how you can use these new technologies to build performant, reliable, large distributed systems.


Thursday, May 21 2015 at 11:20 a.m. MDT - Denver, CO

CoreOS CTO Brandon Philips will be at GlueCon 2015 discussing how to create a Google-like infrastructure. It will cover everything you need to know from the OS to the scheduler.


Thursday, May 21 2015 at 2:40 p.m. EDT - Charleston, SC

You can find Kelsey Hightower at CodeShow SE 2015 explaining how to manage containers at scale with CoreOS and Kubernetes.


Thursday, May 21, 2015 at 7:30 p.m. CEST - Madrid, Spain

Iago Lopez Galeiras will be joining the Madrid DevOps Meetup this month to give a talk on rkt and the App Container spec.


Tuesday, May 26, 2015 at 6:00 p.m. EDT - Charlottesville, VA

Don’t miss Brian Akins from CoreOS give an introduction to building large reliable systems at the DevOps Charlottesville Meetup group.


Friday, May 29, 2015 at 11:50 a.m. PDT - Santa Clara, CA

Be sure to check out Kelsey Hightower at Velocity where his talk will examine all the major components of CoreOS including etcd, fleet, docker, and systemd; and how these components work together.


CoreOS Fest Recap

Check out some of the best moments from CoreOS Fest 2015!

Join us at an event in your area! If you would like our help putting together a CoreOS meetup, or would like to speak at one of our upcoming meetups, please contact us at press@coreos.com.

May 05, 2015

CoreOS State of the Union at CoreOS Fest

At CoreOS Fest we have much to celebrate with the open source community. Today over 800 people contribute to CoreOS projects and we want to thank all of you for being a part of our community.

We want to take this opportunity to reflect on where we started from with CoreOS Linux. Below, we go into depth about each project, but first, a few highlights:

  • We've now shipped CoreOS Linux images for nearly 674 days, since the CoreOS epoch on July 1, 2013.
  • We've rolled out 13 major releases of the Linux kernel from 3.8.0, released in February 2013, to the 4.0 release in April 2015.
  • In that time, we have tagged 329 releases of our images.
  • We have 500+ projects on GitHub that mention etcd, including major projects like Kubernetes, using etcd.

CoreOS Linux

Our namesake project, CoreOS Linux, started with the idea of continuous delivery of a Linux operating system. Best practice in the industry is to ship applications regularly to get the latest security fixes and newest features to users – we think an operating system can be shipped in a similar way. And for nearly two years, since the CoreOS epoch on July 1, 2013, we have been shipping regular updates to CoreOS Linux machines.

In a way, CoreOS Linux is a kernel delivery system. The alpha channel has rolled through 13 major releases of the Linux kernel from 3.8.0 in February 2013 to the recent 4.0 release in April 2015. This doesn’t include all of the minor patch releases we have bumped through as well. In that time we have tagged 329 releases of our images. To achieve this goal, CoreOS uses a transactional system so upgrades can happen automatically.

CoreOS Linux stats and community

CoreOS Linux stats shared at CoreOS Fest

Community feedback has been incredibly important throughout this journey: users help us track down bugs in upstream projects like the Linux kernel, give us feedback on new features, and flag regressions that are missed by our testing.

A wide variety of companies are building their products and infrastructure on top of CoreOS Linux, including many participants at CoreOS Fest:

Deis, a project recently acquired by Engine Yard, spoke yesterday on "Lessons Learned From Building Platforms on Top of CoreOS" Mesosphere DCOS uses CoreOS by default, and we are happy to have them sponsor CoreOS Fest Salesforce Data.com spoke today on how they are using distributed systems and application containers Coinbase presented a talk today on "Container Management & Analytics"

etcd

We build CoreOS Linux with just a single-host use case in mind, but wanted people to trust and use CoreOS to update their entire fleet of machines. To solve this problem of automated yet controlled updates across a distributed set of systems, we built etcd.

etcd was initially created to provide an API-driven distributed "reboot lock" to a cluster of hosts, and it has been very successful serving this basic purpose. But over the last two years, adoption and usage of etcd has been exploded: today it is being used as a key part of projects like Google's Kubernetes, Cloud Foundry's Diego, Mailgun's Vulcan and many more custom service discovery and master election systems.

At CoreOS Fest we have seen demonstrations of a PostgreSQL master election system built by Compose.io, a MySQL master election system built by HP, and a discussion by Yodlr about how they use it for their internal microservice infrastructure. With feedback from all of these users of etcd, we are planning an advanced V3 API, a next-generation disk-backed store and writing new punishing long-running tests to ensure etcd remains a highly reliable component of distributed infrastructure.

CoreOS' etcd stats and community

etcd stats shared at CoreOS Fest

fleet on top of etcd

After etcd, we built fleet, a scheduler system that ties together systemd and etcd into a distributed init system. fleet can be thought of as a logical extension of systemd that operates at the cluster level instead of the machine level.

The fleet project is low level and designed as a foundation for higher order orchestration: its goal is to be a simple and resilient init system for your cluster. It can be used to run containers directly and also as a tool to bootstrap higher-level software like Kubernetes, Mesos, Deis and others.

For more on fleet, see the documentation on launching containers with fleet.

CoreOS' fleet stats and community

fleet stats shared at CoreOS Fest

rkt

The youngest CoreOS project is rkt, a container runtime, which was launched in December. rkt has security as a core focus and was designed to fit into the existing Unix process model to integrate well with tools like systemd and Kubernetes. And rkt was also built to support the concept of pods: a container composed of multiple processes that share resources like local network and IPC.

Where is rkt today? At CoreOS fest we discussed how rkt was integrated into Kubernetes, and showed this functionality in a demo yesterday. rkt is also used in Tectonic, our new integrated container platform. Looking forward, we are planning improved UX around trust and image handling tools, advanced networking capabilities, and splitting the stage1 out from rkt to support other isolation mechanisms like KVM.

CoreOS' rkt stats and community

rkt stats shared at CoreOS Fest

Container networking

Containers are most useful when they can interact with other systems over the network. Today in the container ecosystem we have some fairly basic patterns for network configuration, but over time we will need to give users the ability to configure more complex topologies. CNI (Container Network Interface) defines the API between a runtime like rkt and how a container actually joins a network, via an external plugin interface. Our intention with CNI is to develop a generic networking solution supporting a variety of tools, with reusable plugins for different backend technologies like macvlan, ipvlan, Open vSwitch and more.

flannel is another important and useful component in container network environments. In our future work with flannel, we’d like to introduce a flannel server, integrate it into Kubernetes and add generic UDP encapsulation support.

Ignition: Machine Configuration

Ignition is a new utility for configuring machines on first boot. This utility provides similar mechanisms to coreos-cloudinit but will provide the ability to configure a machine before the first boot. By configuring the system early, problems like ordering around network configuration are more easily solved. Just like coreos-cloudinit, Ignition will also have the ability to mark services to start on boot and configure user accounts.

Ignition is still under heavy development, but we are hoping to be able to start shipping it in CoreOS in the next couple of months.

Participate!

We encourage all of you as users of our systems and to continue having conversations with us. Please share ideas and tell us about what is working well, what may not be working well, and how can continue to have a useful feedback loop. In the GitHub repos for each of these projects, you can find a CONTRIBUTING.md and ROADMAP.md which outlines how to get started and where the projects are going. Thank you to our contributors!

We will also have the replays of the talks available at a later date, which will include a demo of Ignition and more

May 04, 2015

App Container spec gains new support as a community-led effort

Today is the inaugural CoreOS Fest, the community event for distributed systems and application containers. We at CoreOS are here to celebrate you – those who want to join us on a journey to secure the backend of the Internet and build distributed systems technologies to bring web scale architecture to any organization. We've come a long way since releasing our first namesake project, CoreOS Linux, in 2013, and as a company we now foster dozens of open source projects as we work together with the community to create the components necessary for this new paradigm in production infrastructure.

An important part of working with this community has been the development of the App Container spec (appc), which provides a definition on how to build and run containerized applications. Announced in December, the appc spec emphasizes application container security execution, portability and modularity. rkt, a container runtime developed by CoreOS, is the first implementation of appc.

As security and portability between stacks becomes central to the successful adoption of application containers, today appc has gained support from various companies in the community:

  • Google has furthered its support of appc by implementing rkt into Kubernetes and joining as a maintainer of appc
  • Apcera has announced an additional appc implementation called Kurma
  • Red Hat has assigned an engineer to participate as a maintainer of appc
  • VMware recently announced how they will contribute to appc and shipped rkt in Project Photon

In order to ensure the specification remains a community-led effort, the appc project has established a governance policy and elected several new community maintainers unaffiliated with CoreOS: initially, Vincent Batts of Red Hat, Tim Hockin of Google and Charles Aylward of Twitter. This new set of maintainers brings each of their own unique points of view and allows appc to be a true collaborative effort. Two of the initial developers of the spec from CoreOS, Brandon Philips and Jonathan Boulle, remain as maintainers, but now are proud to have the collective help of others to make the spec what it is intended to be: open, well-specified and developed by a community.

In the months after the launch of appc, we have seen the adoption and support behind a common application container specification grow quickly. These companies and individuals are coming together to ensure there is a well defined specification for application containers, providing guidelines to ensure security, openness and modularity between stacks.

Google furthers its support of appc by integrating rkt into Kubernetes

Today also marks support for appc arriving in the Kubernetes project, via the integration of rkt as a configurable container runtime for Kubernetes clusters.

"The first implementation of the appc specification into Kubernetes, through the support of CoreOS rkt, is an important milestone for the Kubernetes project," said Craig McLuckie, product manager and Kubernetes co-founder at Google. "Designed with cluster first management in mind, appc support enables developers to use their preferred container image through the same Google infrastructure inspired orchestration framework."

Kubernetes is an open source project introduced by Google to help organizations run their infrastructure in a similar manner to the internal infrastructure that runs Google Search, Gmail and other Google services. Today's announcement of rkt being integrated directly into Kubernetes means that users will have the ability to run ACIs, the image format defined in the App Container spec, and take advantage of rkt’s first-class support for pods. rkt’s native support for running Docker images means they can also continue to use their existing images.

Apcera’s new implementation of appc, Kurma

Also announced today is Kurma, a new implementation of appc by Apcera. Kurma is an execution environment for running applications in containers. Kurma provides a framework that allows containers to be managed and orchestrated beyond itself. Kurma joins a variety of implementations of the appc spec that have emerged in the last six months, such as Jetpack, an App Container runtime for FreeBSD, and libappc, a C++ library for working with containerized applications.

"Apcera has long been invested in secure container technology to power our core platform," said Derek Collison, founder and CEO of Apcera. "We are excited to bring our technology to the open source community and to partner with CoreOS on the future of appc."

Red Hat involvement as a maintainer of appc

Red Hat recently assigned an engineer to participate as a maintainer of appc. Bringing years of experience in container development and leadership in Docker, Kubernetes and the Linux community as a whole, they bring a unique skillset to the effort.

“The adoption of container technology is an exciting trend and one that we believe can have significant customer benefit,” said Matt Hicks, senior director, engineering, Red Hat. “But at the same time, fragmentation of approaches and formats runs the risk of undercutting the momentum. We are excited to be included as maintainers and will work to not only innovate, but also to help create stability for our customers that adopt containers.”

VMware’s continued support of appc

In April, VMware announced support for appc and shipped rkt in Project Photon™, making rkt available to VMware vSphere® and VMware vCloud® Air™ customers. VMware has been an early proponent of appc and is working closely with the community to push forward the spec.

Today VMware reaffirmed their commitment to appc, showing its importance as a community-wide specification.

“VMware supports appc today offering rkt to our customers as a container runtime engine,” said Kit Colbert, vice president and CTO, Cloud-Native Apps, VMware. “We will work with the appc community to address portability and security across platforms – topics that are top of mind for enterprises seeking to support application containers in their IT environments.”

Join the appc community effort

We welcome these new companies into the community and invite others to join the movement to bring forward a secure and portable container standard. Get involved by joining the appc mailing list and discussion on GitHub. We welcome the continued independent implementations of tools to be able to run the same container consistently.

Thank you to all who are coming out to CoreOS Fest. Please follow along with the event on Twitter @CoreOSFest and #CoreOSFest. For those who aren't able to make it in person, the talks will be recorded and available at a later date.

May 01, 2015

Sahana Nepal Earthquake SitRep 3

The Sahana Software Foundation has deployed an instance of the Sahana Open Source Disaster Management Software server to provide a flexible solution for organizations and communities to respond to the Nepal Earthquake: http://nepal.sahana.io/ Please contact sahana-nepal-response@sahanafoundation.org with questions or to request  support [Read the Rest...]

April 29, 2015

CoreOS Fest 2015 Guide

CoreOS Fest 2015 is in less than a week, and we want to make sure that you’re ready! To ensure that you have everything you need in order to have the best two days, we’ve put together a CoreOS Fest Guide.

If you haven’t gotten a ticket, but plan on joining us, there are only a few remaining tickets so be sure to register now while they are available.

We wouldn’t be here today without the help from our wonderful sponsors. Thank you to Intel, Google, VMware, AWS, Rackspace, Chef, Project Calico, Sysdig, Mesosphere and Giant Swarm.

Location

CoreOS Fest is located at The Village at 969 Market St. (between 5th and 6th St.) in downtown San Francisco, right by the Powell St. BART station. For local parking options, please check for options here.

Badge Pick-Up Times

The registration desk is located on the top floor of The Village. When you walk in, head straight up the stairs to pick up your badge.

Monday, May 4:

8:00 a.m. - end of day

At the registration desk on the top floor

Tuesday, May 5:

8:00 a.m. - end of day

At the registration desk on the top floor

Breakfast and Lunch Details

Breakfast and lunch will be held on the top floor each day. Dietary restrictions? We’ve accommodated for most diets, but if you’re concerned that we won’t have something for your specific diet, we recommend packing a lunch.

Monday, May 4:

Breakfast: 8 a.m. - 9 a.m.

Lunch: 11:45 a.m. - 1:00 p.m.

Tuesday, May 5:

Breakfast: 8:30 a.m. - 9:30 a.m.

Lunch: 11:45 a.m. - 1:00 p.m.

After Party Details

Join us May 4 for our After Party on the top floor of The Village from 5:45 p.m. to 8:00 p.m. We’ll also share a drink and a goodbye on May 5 at 5 p.m. - 6 p.m. at the AWS Pop-Up Loft next door, at 925 Market St.

CoreOS Office Hours

Attendees may sign up for office hours through a link you’ll get in your attendee email. Since there is a limited number of spots, please look at the conference schedule before getting your office hours tickets. Paper office hours tickets will not need to be shown at any time during CoreOS Fest as long as you have your badge.

Questions?

Have questions or need help the day of the event? You can email us at fest@coreos.com.

A Few Things to Keep in Mind

Be on time

Unlike CS101, this is something you’ll want to wake up for. We promise to have breakfast — and more importantly, coffee — waiting for you.

Talks will be recorded

All talks will be recorded, so if you miss one, don’t worry! All videos will be posted on the Fest ‘15 site after the event.

Bring a bag

Here at CoreOS, we believe that there is such a thing as having too many conference tote bags. If you’ll need a bag, make sure to bring your own, and we'll spare you the bagception dilemma.

Charging and Wi-Fi

Wi-Fi is available at the venue, along with charging stations and outlets.

Come with questions

Some of the most influential developers in infrastructure will be there to tell stories of their successes, missteps and lessons learned. They’re here to answer your questions, so bring on the tough ones!

Follow #CoreOSFest on Twitter

Make sure that you follow @CoreOSFest and #CoreOSFest on Twitter for live schedule updates, recorded talks and news.

We’re only a few days away from CoreOS Fest and we’re excited to see you all there!

Sahana Nepal Earthquake SitRep 2

We have been stepping up our coordination efforts and engaging with folks in Nepal and from around the world who are interested in using Sahana to support the response to this devastating earthquake. Arun Pyasi is currently in Nepal and [Read the Rest...]

April 28, 2015

Slim application containers (using Docker)

Another talk I gave at Linux.conf.au, was about making slim containers (youtube) –  ones that contain only the barest essentials needed to run an application.

And I thought I’d do it from source, as most “Built from source” images also contain the tools used to build the software.

1. Make the Docker base image you’re going to use to build the software

In January 2015, the main base images and their sizes looked like:

scratch             latest              511136ea3c5a        19 months ago       0 B
busybox             latest              4986bf8c1536        10 days ago         2.433 MB
debian              7.7                 479215127fa7        10 days ago         85.1 MB
ubuntu              15.04               b12dbb6f7084        10 days ago         117.2 MB
centos              centos7             acc1b23376ec        10 days ago         224 MB
fedora              21                  834629358fe2        10 days ago         250.2 MB
crux                3.1                 7a73a3cc03b3        10 days ago         313.5 MB

I’ll pick Debian, as I know it, and it has the fewest restrictions on what contents you’re permitted to redistribute (and because bootstrapping busybox would be an amazing talk on its own).

Because I’m experimenting, I’m starting by seeing how small I can make a new Debian base image –  starting with:

FROM debian:7.7

RUN rm -r /usr/share/doc /usr/share/doc-base \
          /usr/share/man /usr/share/locale /usr/share/zoneinfo

CMD ["/bin/sh"]

Then make a new single layer (squashed image) by running `docker export` and `docker import`

REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
debian              7.7                 479215127fa7        10 days ago         85.1 MB
our/debian:jessie   latest              cba1d00c3dc0        1 seconds ago       46.6 MB

Ok, not quite half, but you get the idea.

Its well worth continuing this exercise using things like `dpkg —get-selections` to remove anything else you won’t need.

Importantly, once you’ve made your smaller base image, you should use it consistently for ALL the containers you use. This means that whenever there are important security fixes, that base image will be downloadable as quickly as possible –  and all your related images can be restarted quickly.

This also means that you do NOT want to squish your images to one or two layers, but rather into some logical set of layers that match your deployment update risks –  a common root base, and then layers based on common infrastructure, and lastly application and customisation layers.

2. Build static binaries –  or not

Building a static binary of your application (in typical `Go` style) makes some things simpler –  but in the end, I’m not really convinced it makes a useful difference.

But in my talk, I did it anyway.

Make a Dockerfile that installs all the tools needed, builds nginx, and then output’s a tar file that is a new build context for another Docker image (and contains the libraries ldd tells us we need):

cat Dockerfile.build-static-nginx | docker build -t build-nginx.static -
docker run --rm build-nginx.static cat /opt/nginx.tar > nginx.tar
cat nginx.tar | docker import - micronginx
docker run --rm -it -p 80:80 micronginx /opt/nginx/sbin/nginx -g "daemon off;"
nginx: [emerg] getpwnam("nobody") failed (2: No such file or directory)

oh. I need more than just libraries?

3. Use inotify to find out what files nginx actually needs!

Use the same image, but start it with Bash –  use that to install and run inotify, and then use `docker exec` to start nginx:

docker run --rm build-nginx.static bash
$ apt-get install -yq inotify-tools iwatch
# inotifywait -rm /etc /lib /usr/lib /var
Setting up watches.  Beware: since -r was given, this may take a while!
Watches established.
/lib/x86_64-linux-gnu/ CLOSE_NOWRITE,CLOSE libnss_files-2.13.so
/lib/x86_64-linux-gnu/ CLOSE_NOWRITE,CLOSE libnss_nis-2.13.so
/lib/x86_64-linux-gnu/ CLOSE_NOWRITE,CLOSE ld-2.13.so
/lib/x86_64-linux-gnu/ CLOSE_NOWRITE,CLOSE libc-2.13.so
/lib/x86_64-linux-gnu/ CLOSE_NOWRITE,CLOSE libnsl-2.13.so
/lib/x86_64-linux-gnu/ CLOSE_NOWRITE,CLOSE libnss_compat-2.13.so
/etc/ OPEN passwd
/etc/ OPEN group
/etc/ ACCESS passwd
/etc/ ACCESS group
/etc/ CLOSE_NOWRITE,CLOSE group
/etc/ CLOSE_NOWRITE,CLOSE passwd
/etc/ OPEN localtime
/etc/ ACCESS localtime
/etc/ CLOSE_NOWRITE,CLOSE localtime

Perhaps it shouldn’t be too surprising that nginx expects to rifle through your user password files when it starts :(

4. Generate a new minimal Dockerfile and tar file Docker build context, and pass that to a new `docker build`

The trick is that the build container Dockerfile can generate the minimal Dockerfile and tar context, which can then be used to build a new minimal Docker image.

The excerpt from the Dockerfile that does it looks like:


# Add a Dockerfile to the tar file
RUN echo "FROM busybox" > /Dockerfile \
    && echo "ADD * /" >> /Dockerfile \
    && echo "EXPOSE 80 443" >> /Dockerfile \
    && echo 'CMD ["/opt/nginx/sbin/nginx", "-g", "daemon off;"]' >> /Dockerfile

RUN tar cf /opt/nginx.tar \
           /Dockerfile \
           /opt/nginx \
           /etc/passwd /etc/group /etc/localtime /etc/nsswitch.conf /etc/ld.so.cache \
           /lib/x86_64-linux-gnu

This tar file can then be passed on using

cat nginx.tar | docker build -t busyboxnginx .

Result

Comparing the sizes, our build container is about 1.4GB, the Official nginx image about 100MB, and our minimal nginx container, 21MB to 24MB –  depending if we add busybox to it or not:

REPOSITORY          TAG            IMAGE ID            CREATED              VIRTUAL SIZE
micronginx          latest         52ec332b65fc        53 seconds ago       21.13 MB
nginxbusybox        latest         80a526b043fd        About a minute ago   23.56 MB
build-nginx.static  latest         4ecdd6aabaee        About a minute ago   1.392 GB
nginx               latest         1822529acbbf        8 days ago           91.75 MB

Its interesting to remember that we rely heavily on `I know this, its a UNIX system` –  application services can have all sorts of hidden assumptions that won’t be revealed without putting them into more constrained environments.

In the same way that we don’t ship the VM / filesystem of our build server, you should not be shipping the container you’re building from source.

This analysis doesn’t try to restrict nginx to only opening certain network ports, devices, or IPC mechanisms – so there’s more to be done…

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

April 27, 2015

Announcing GovCloud support on AWS

Today we are happy to announce CoreOS Linux now supports Amazon Web Services GovCloud (US). AWS GovCloud is an isolated AWS Region for US government agencies and customers to move sensitive workloads into the AWS cloud by addressing their specific regulatory and compliance requirements. With this, automatic updates are now stable and available to all government agencies using the cloud.

CoreOS Linux customers will benefit from the security thanks to support of FedRAMP, a US government program sharing a standardized approach to security assessment, authorization and continuous monitoring for cloud products and services.

For more details, see the documentation on Running CoreOS on EC2.

New gst-rpicamsrc features

I’ve pushed some new changes to my Raspberry Pi camera GStreamer wrapper, at https://github.com/thaytan/gst-rpicamsrc/

These bring the GStreamer element up to date with new features added to raspivid since I first started the project, such as adding text annotations to the video, support for the 2nd camera on the compute module, intra-refresh and others.

Where possible, you can now dynamically update any of the properties – where the firmware supports it. So you can implement digital zoom by adjusting the region-of-interest (roi) properties on the fly, or update the annotation or change video effects and colour balance, for example.

The timestamps produced are now based on the internal STC of the Raspberry Pi, so the audio video sync is tighter. Although it was never terrible, it’s now more correct and slightly less jittery.

The one major feature I haven’t enabled as yet is stereoscopic handling. Stereoscopic capture requires 2 cameras attached to a Raspberry Pi Compute Module, so at the moment I have no way to test it works.

I’m also working on GStreamer stereoscopic handling in general (which is coming along). I look forward to releasing some of that code soon.

 

Sahana Nepal Earthquake SitRep 1

As you are probably aware a 7.8 magnitude earthquake has struck Nepal on 25th April causing 2,288 deaths and injuring over 5,500 people [1]. Sahana is already being used in Nepal by both the Nepal Red Cross Society and the National Emergency Operation Center [Read the Rest...]

April 26, 2015

Anti-Systemd People

For the Technical People

This post isn’t really about technology, I’ll cover the technology briefly skip to the next section if you aren’t interested in Linux programming or system administration.

I’ve been using the Systemd init system for a long time, I first tested it in 2010 [1]. I use Systemd on most of my systems that run Debian/Wheezy (which means most of the Linux systems I run which aren’t embedded systems). Currently the only systems where I’m not running Systemd are some systems on which I don’t have console access, while Systemd works reasonably well it wasn’t a standard init system for Debian/Wheezy so I don’t run it everywhere. That said I haven’t had any problems with Systemd in Wheezy, so I might have been too paranoid.

I recently wrote a blog post about systemd, just some basic information on how to use it and why it’s not a big deal [2]. I’ve been playing with Systemd for almost 5 years and using it in production for almost 2 years and it’s performed well. The most serious bug I’ve found in systemd is Bug #774153 which causes a Wheezy->Jessie upgrade to hang until you run “systemctl daemon-reexec” [3].

I know that some people have had problems with systemd, but any piece of significant software will cause problems for some people, there are bugs in all software that is complex enough to be useful. However the fact that it has worked so well for me on so many systems suggests that it’s not going to cause huge problems, it should be covered in the routine testing that is needed for a significant deployment of any new version of a distribution.

I’ve been using Debian for a long time. The transitions from libc4 to libc5 and then libc6 were complex but didn’t break much. The use of devfs in Debian caused some issues and then the removal of devfs caused other issues. The introduction of udev probably caused problems for some people too. Doing major updates to Debian systems isn’t something that is new or which will necessarily cause significant problems, I don’t think that the change to systemd by default compares to changing from a.out binaries to ELF binaries (which required replacing all shared objects and executables).

The Social Issue of the Default Init

Recently the Debian technical committee determined that Systemd was the best choice for the default init system in Debian/Jessie (the next release of Debian which will come out soon). Decisions about which programs should be in the default install are made periodically and it’s usually not a big deal. Even when the choice is between options that directly involve the user (such as the KDE and GNOME desktop environments) it’s not really a big deal because you can just install a non-default option.

One of the strengths of Debian has always been the fact that any Debian Developer (DD) can just add any new package to the archive if they maintain it to a suitable technical standard and if copyright and all other relevant laws are respected. Any DD who doesn’t like any of the current init systems can just package a new one and upload it. Obviously the default option will get more testing, so the non-default options will need more testing by the maintainer. This is particularly difficult for programs that have significant interaction with other parts of the system, I’ve had difficulties with this over the course of 14 years of SE Linux development but I’ve also found that it’s not an impossible problem to solve.

It’s generally accepted that making demands of other people’s volunteer work is a bad thing, which to some extent is a reasonable position. There is a problem when this is taken to extremes, Debian has over 1000 developers who have to work together so sometimes it’s a question of who gets to do the extra work to make the parts of the distribution fit together. The issue of who gets to do the work is often based on what parts are the defaults or most commonly used options. For my work on SE Linux I often have to do a lot of extra work because it’s not part of the default install and I have to make my requests for changes to other packages be as small and simple as possible.

So part of the decision to make Systemd be the default init is essentially a decision to impose slightly more development effort on the people who maintain SysVInit if they are to provide the same level of support – of course given the lack of overall development on SysVInit the level of support provided may decrease. It also means slightly less development effort for the people who maintain Systemd as developers of daemon packages MUST make them work with it. Another part of this issue is the fact that DDs who maintain daemon packages need to maintain init.d scripts (for SysVInit) and systemd scripts, presumably most DDs will have a preference for one init system and do less testing for the other one. Therefore the choice of systemd as the default means that slightly less developer effort will go into init.d scripts. On average this will slightly increase the amount of sysadmin effort that will be required to run systems with SysVInit as the scripts will on average be less well tested. This isn’t going to be a problem in the short term as the current scripts are working reasonably well, but over the course of years bugs may creep in and a proposed solution to this is to have SysVInit scripts generated from systemd config files.

We did have a long debate within Debian about the issue of default init systems and many Debian Developers disagree about this. But there is a big difference between volunteers debating about their work and external people who don’t contribute but believe that they are entitled to tell us what to do. Especially when the non-contributors abuse the people who do the work.

The Crowd Reaction

In a world filled with reasonable people who aren’t assholes there wouldn’t be any more reaction to this than there has been to decisions such as which desktop environment should be the default (which has caused some debate but nothing serious). The issue of which desktop environment (or which version of a desktop environment) to support has a significant affect on users that can’t be avoided, I could understand people being a little upset about that. But the init system isn’t something that most users will notice – apart from the boot time.

For some reason the men in the Linux community who hate women the most seem to have taken a dislike to systemd. I understand that being “conservative” might mean not wanting changes to software as well as not wanting changes to inequality in society but even so this surprised me. My last blog post about systemd has probably set a personal record for the amount of misogynistic and homophobic abuse I received in the comments. More gender and sexuality related abuse than I usually receive when posting about the issues of gender and sexuality in the context of the FOSS community! For the record this doesn’t bother me, when I get such abuse I’m just going to write more about the topic in question.

While the issue of which init system to use by default in Debian was being discussed we had a lot of hostility from unimportant people who for some reason thought that they might get their way by being abusive and threatening people. As expected that didn’t give the result they desired, but it did result in a small trend towards people who are less concerned about the reactions of users taking on development work related to init systems.

The next thing that they did was to announce a “fork” of Debian. Forking software means maintaining a separate version due to a serious disagreement about how it should be maintained. Doing that requires a significant amount of work in compiling all the source code and testing the results. The sensible option would be to just maintain a separate repository of modified packages as has been done many times before. One of the most well known repositories was the Debian Multimedia repository, it was controversial due to flouting legal issues (the developer produced code that was legal where they lived) and due to confusion among users. But it demonstrated that you can make a repository containing many modified packages. In my work on SE Linux I’ve always had a repository of packages containing changes that haven’t been accepted into Debian, which included changes to SysVInit in about 2001.

The latest news on the fork-Debian front seems to be the call for donations [4]. Apparently most of the money that was spent went to accounting fees and buying a laptop for a developer. The amount of money involved is fairly small, Forbes has an article about how awful people can use “controversy” to get crowd-funding windfalls [5].

MikeeUSA is an evil person who hates systemd [6]. This isn’t any sort of evidence that systemd is great (I’m sure that evil people make reasonable choices about software on occasion). But it is a significant factor in support for non-systemd variants of Debian (and other Linux distributions). Decent people don’t want to be associated with people like MikeeUSA, the fact that the anti-systemd people seem happy to associate with him isn’t going to help their cause.

Conclusion

Forking Debian is not the correct technical solution to any problem you might have with a few packages. Filing bug reports and possibly forking those packages in an external repository is the right thing to do.

Sending homophobic and sexist abuse is going to make you as popular as the GamerGate and GodHatesAmerica.com people. It’s not going to convince anyone to change their mind about technical decisions.

Abusing volunteers who might consider donating some of their time to projects that you like is generally a bad idea. If you abuse them enough you might get them to volunteer less of their time, but the most likely result is that they just don’t volunteer on anything associated with you.

Abusing people who write technical blog posts isn’t going to convince them that they made an error. Abuse is evidence of the absence of technical errors.

April 24, 2015

rkt 0.5.4, featuring repository authentication, port forwarding and more

Since the last rkt release a few weeks ago, development has continued apace, and today we're happy to announce rkt v0.5.4. This release includes a number of new features and improvements across the board, including authentication for image fetching, per-application arguments, running from pod manifests, and port forwarding support – check below the break for more details.

rkt, a container runtime for application containers, is under heavy development but making rapid progress towards a 1.0 release. Earlier this week, VMware announced support for rkt and the emerging App Container (appc) specification. appc is an open specification defining how applications can be run in containers, and rkt is the first implementation of the spec. With increasing industry commitment and involvement in appc, it is quickly fulfilling its promise of becoming a standard of how applications should be deployed in containers.

VMware released a short demo about how its new Project Photon works with rkt via Vagrant and VMware Fusion.

Read on below for more about the latest features in rkt 0.5.4.

Authentication for image fetching

rkt now supports HTTP Basic and OAuth Bearer Token authentication when retrieving remote images from HTTP endpoints and Docker registries. To facilitate this, we've introduced a flexible configuration system, allowing vendors to ship default configurations and then systems administrators to supplement or override configuration locally. Configuration is fully versioned to support forwards and backwards compatibility – check out the rkt documentation for more details.

Here's a simple example of fetching an image from a private Docker registry (note that Docker registries support only Basic authentication):

$ sudo cat /etc/rkt/auth.d/myuser.json 
{
    "rktKind": "dockerAuth",
    "rktVersion": "v1",
    "registries": ["quay.io"],
    "credentials": {
        "user": "myuser",
        "password": "sekr3tstuff"
    }
}
$ sudo /rkt --insecure-skip-verify fetch docker://quay.io/myuser/privateapp
rkt: fetching image from docker://quay.io/myuser/privateapp
Downloading layer: cf2616975b4a3cba083ca99bc3f0bf25f5f528c3c52be1596b30f60b0b1c37ff
Downloading layer: 6ce2e90b0bc7224de3db1f0d646fe8e2c4dd37f1793928287f6074bc451a57ea
....

Per-application arguments and image signature verification for local images

The flag parsing in rkt run has been reworked to support per-app flags when running a pod with multiple images. Furthermore, in keeping with our philosophy of "secure by default", rkt will now attempt signature verification even when referencing local image files (during rkt fetch or rkt run commands). In this case, rkt expects to find the signature file alongside the referenced image – for example:

 $ rkt run imgs/pauser.aci
     error opening signature file: open /home/coreos/rkt/imgs/pauser.aci.asc: no such file or directory
 $ gpg2 --armor --detach-sign imgs/pauser.aci
 $ rkt run imgs/pauser.aci
     rkt: signature verified:
       Irma Bot (ACI Signing Key)
     ^]^]^]Container rootfs terminated by signal KILL.

Specific signatures can be provided with the --signature flag, which also applies per-app in the case of multiple references. In this example, we import two local images into the rkt CAS, specifying images signatures for both:

     $ rkt fetch   \
        imgs/pauser.aci --signature ./present.asc  \
        imgs/bash.aci --signature foo.asc
      rkt: signature verified:
        Joe Packager (CoreOS)
sha512-b680fd853abeba1a310a344e9fbf8ac9
sha512-ae78000a3d38fae4009699bf7494b293

Running from pod manifests

In previous versions of rkt, the arguments passed to rkt run (or rkt prepare) would be used to internally generate a Pod Manifest which is executed by later stages of rkt. This release introduces a new flag, --pod-manifest, to both rkt prepare and rkt run, to supply a pre-created pod manifest to rkt.

A pod manifest completely defines the execution environment of the pod to be run, such as volume mounts, port mappings, isolators, etc. This allows users complete control over all of these parameters in a well-defined way, without the need of a complicated rkt command-line invocation. For example, when integrating rkt as a container runtime for a cluster orchestration system like Kubernetes, the system can now programmatically generate a pod manifest instead of feeding a complicated series of arguments to the rkt CLI.

In this first implementation — and following the prescriptions of the upstream appc spec — the pod manifest is treated as the definitive record of the desired execution state: anything specified in the app fields will override what is in the original image, such as exec parameters, volumes mounts, port mappings, etc. This allows the operator to completely control what will be executed by rkt. Since the pod manifest is treated as a complete source of truth — and expected to be generated by orchestration tools with complete knowledge of the execution environment – --pod-manifest is initially considered mutually exclusive with other flags, such as --volumes and --port. See rkt run --help for more details.

Port forwarding

rkt now supports forwarding ports from the host to pods when using private networking.

As a simple example, given an app with the following ports entry in its Image Manifest:

{
    "name": "http",
    "port": 80,
    "protocol": "tcp"
}

the following rkt run command can be used to forward traffic from the host's TCP port 8888 to port 80 inside the pod:

rkt run --private-net --port=http:8888 myapp.aci

Whenever possible, it is more convenient to use a SDN solution like flannel to assign routable IPs to rkt pods. However, when such an option is not available, or for "edge" apps that require straddling both SDN and external networks (such as a load balancer), port forwarding can be used to expose select ports to the pod.

Testing, forward-compatibility, and more

There's plenty more under the hood in this release, including an extensive functional test harness, a new database schema migration process, and various internal improvements to the codebase. As we've talked about previously, rkt is a young project and we aren't yet able to guarantee API/ABI stability between releases, but forward-compatibility is a top priority for the forthcoming 0.6 release, and these changes are important steps towards this goal.

For full details of all the changes in this release, check out the release on GitHub.

Get involved!

We're on a journey to create an efficient, secure and composable application container runtime for production environments, and we want you to join us. Take part in the discussion through the rkt-dev mailing list or GitHub issues — and for those eager to get stuck in, contribute directly to the project. Are you doing interesting things with rkt or appc and want to share it with the world? Contact our marketing team at press@coreos.com.

CAP on a Map project kickoff in the Maldives

A workshop and set of meetings (April 15 & 16, 2015) took place in the capitol city Male in the Maldives. It was an event of the CAP on a Map kickoff in the Maldives. The project aims to improve [Read the Rest...]

April 23, 2015

Verification Challenge 5: Uses of RCU

This is another self-directed verification challenge, this time to validate uses of RCU instead of validating the RCU implementations as in earlier posts. As you can see from Verification Challenge 4, the logic expression corresponding even to the simplest Linux-kernel RCU implementation is quite large, weighing in at tens of thousands of variables and hundreds of thousands of clauses. It is therefore worthwhile to look into the possibility of a trivial model of RCU that could be used for verification.



Because logic expressions do not care about cache locality, memory contention, energy efficiency, CPU hotplug, and a host of other complications that a Linux-kernel implementation must deal with, we can start with extreme simplicity. For example:



 1 static int rcu_read_nesting_global;
 2 
 3 static void rcu_read_lock(void)
 4 {
 5   (void)__sync_fetch_and_add(&rcu_read_nesting_global, 2);
 6 }
 7 
 8 static void rcu_read_unlock(void)
 9 {
10   (void)__sync_fetch_and_add(&rcu_read_nesting_global, -2);
11 }
12 
13 static inline void assert_no_rcu_read_lock(void)
14 {
15   BUG_ON(rcu_read_nesting_global >= 2);
16 }
17 
18 static void synchronize_rcu(void)
19 {
20   if (__sync_fetch_and_xor(&rcu_read_nesting_global, 1) < 2)
21     return;
22   SET_NOASSERT();
23   return;
24 }




The idea is to reject any execution in which synchronize_rcu() does not wait for all readers to be done. As before, SET_ASSERT() sets a variable that suppresses all future assertions.



Please note that this model of RCU has some shortcomings:





  1. There is no diagnosis of rcu_read_lock()/rcu_read_unlock() misnesting. (A later version of the model provides limited diagnosis, but under #ifdef CBMC_PROVE_RCU.)

  2. The heavyweight operations in rcu_read_lock() and rcu_read_unlock() result in artificial ordering constraints. Even in TSO systems such as x86 or s390, a store in a prior RCU read-side critical section might be reordered with loads in later critical sections, but this model will act as if such reordering was prohibited.

  3. Although synchronize_rcu() is permitted to complete once all pre-existing readers are done, in this model it will instead wait until a point in time at which there are absolutely no readers, whether pre-existing or new. Therefore, this model's idea of an RCU grace period is even heavier weight than in real life.





Nevertheless, this approach will allow us to find at least some RCU-usage bugs, and it fits in well with cbmc's default fully-ordered settings. For example, we can use it to verify a variant of the simple litmus test used previously:



 1 int r_x;
 2 int r_y;
 3 
 4 int x;
 5 int y;
 6 
 7 void *thread_reader(void *arg)
 8 {
 9   rcu_read_lock();
10   r_x = x;
11 #ifdef FORCE_FAILURE_READER
12   rcu_read_unlock();
13   rcu_read_lock();
14 #endif
15   r_y = y;
16   rcu_read_unlock();
17   return NULL;
18 }
19 
20 void *thread_update(void *arg)
21 {
22   x = 1;
23 #ifndef FORCE_FAILURE_GP
24   synchronize_rcu();
25 #endif
26   y = 1;
27   return NULL;
28 }
29 
30 int main(int argc, char *argv[])
31 {
32   pthread_t tr;
33 
34   if (pthread_create(&tr, NULL, thread_reader, NULL))
35     abort();
36   (void)thread_update(NULL);
37   if (pthread_join(tr, NULL))
38     abort();
39 
40   BUG_ON(r_y != 0 && r_x != 1);
41   return 0;
42 }




This model has only 3,032 variables and 8,844 clauses, more than an order of magnitude smaller than for the Tiny RCU verification. Verification takes about half a second, which is almost two orders of magnitude faster than the 30-second verification time for Tiny RCU. In addition, the model successfully flags several injected errors. We have therefore succeeded in producing a simpler and faster model approximating RCU, and that can handle multi-threaded litmus tests.



A natural next step would be to move to litmus tests involving linked lists. Unfortunately, there appear to be problems with cbmc's handling of pointers in multithreaded situations. On the other hand, cbmc's multithreaded support is quite new, so hopefully there will be fixes for these problems in the near future. After fixes appear, I will give the linked-list litmus tests another try.



In the meantime, the full source code for these models may be found here.

Dockerising Puppet

Learn how to use Puppet to manage Docker containers. This post contains complementary technical details to the talk on 23th of April at the Puppet Camp in Sydney.

Manageacloud is a company that specialises in multi-cloud orchestration. Please contact us if you want to know more.

 

Summary

The goal is to manage the configuration of Docker containers using existing puppet modules and Puppet Enterprise. We will use the example of a Wordpress application and two different approaches:

  • Fat containers: treating the container as a virtual machine
  • Microservices: one process per container, as originally recommended by Docker

 

Docker Workflow

 

 

1 - Dockerfile

Dockerfile is the "source code" of the container image:

  • It uses imperative programming, which means we need specify every command, tailored to the target distribution, to achieve the desired state.
  • It is very similar to bash; if you know bash, you know how to use a Dockerfile
  • In large and complex architectures, the goal of the Dockerfile is to hook a configuration management system like puppet to install the required software and configure the container.

For example, this is a Dockerfile that will create a container image with Apache2 installed in Ubuntu:

FROM ubuntu MAINTAINER Ruben Rubio Rey <ruben@manageacloud.com> RUN apt-get update RUN apt-get install apache2

 

2 - Container Image

The container image is generated from the Dockerfile using docker build:

docker build -t <image_name> <directory_path_to_Dockerfile>

 

3 - Registry

An analogy for the Registry is that it works like a git repository. It allows you to push and pull container's images. Container images can have different versions.

The Registry is the central point to distribute Docker containers. It does not matter if you use Kubernetes, CoreOS Fleet, Docker Swarm, Mesos or you are just orchestrating in a Docker host.

For example, if you are the DevOps person within your organization, you may decide that the developers (who are already developing under Linux) will use containers instead of virtual machines for the development environment. The DevOps person should be responsible to creating the Dockerfile, building the container image and pushing it to the registry. All developers within your organization can now pull the latest version of the development environment from the registry and use it.

 

4 - Development Environment

Docker containers can be used in a development environment. You can make developers more comfortable with the transition to containers by using the controversial "Fat Containers" approach.

 

5 - Production Environment

You can orchestrate Docker containers in production for two different purposes:

  • Docker Host: Using containers as a way to distribute the configuration. This post focuses on using containers in Docker Hosts.
  • Cluster Management: Mesos, Kubernetes, Docker Swarm and CoreOS Fleet are used to manage containerised applications in clustered environments. This aims to create a layer in the top of the different available virtual machines, allowing you to manage all resources as one unified whole. Those technologies are very likely to evolve significantly over the next 12 months.

 

Fat Containers vs Microservices

When you are creating containers, there are three different approaches:

  • Microservices: running one single process per container.
  • Fat containers: running many processes and services in a container. In fact, you are treating the container as a virtual machine.

The problem with the microservices approach is that Linux is not really designed for microservices. If you have some processes running in a container, and one of those processes is detached from the parent, it is responsibility of the init process to recycle those resources. If those resources are not recycled, it will become a zombie process.

Some Linux applications are not designed for single process systems either:

  • Many Linux applications are designed to have a crontab daemon to run periodical tasks.
  • Many Linux applications writes vital information directly to the syslog. If the syslog daemon is not running, you might never notice those messages.

In order to use multiple processes in a container, you need to use an init process or similar. There are base images with init processes built in. For example for ubuntu and debian.

What to use ? My advice is to be pragmatic; no one size fits all. Your goal is to solve business problems without creating technical debt. If fat containers better suits your business need, use it. However if microservices fits better, use that instead. Ideally, you should know how to use both, and analyse the case in point to decide what is best for your company. There are no technical reasons to use one over the other.

 

 

Managing Docker Containers with Puppet

When we use Puppet (or any other configuration management system) to manage Docker containers, there are two sets of tasks: container creation and container orchestration.

 

Container Creation

  1. The Dockerfile installs the puppet clients and invokes the puppet master to retrieve the container's configuration
  2. The new image is pushed to the registry

 

Container Orchestration

  1. Docker's host puppet agent invokes the puppet master to get the configuration
  2. The puppet agent identifies a set of containers. Those containers must be pulled from the Docker registry
  3. The puppet agent pulls, configures and starts the Docker containers in the Docker host

 

Puppet Master Configuration

For this configuration, we are assuming that Puppet Master is running in a private network, where all the clients are secure. This allows us to use the configuration setting autosign = true in the master's puppet.conf.

 

Docker Registry

The Docker registry is like a "git repository" for containers. You can push and pull containers. Containers might have a version number. You can use a provider for the Docker registry or you can install one yourself. For this example we will use the module garethr/docker from the PuppetForge to create our docker-registry puppet manifest:

class docker-registry {

    include 'docker'

    docker::run { 'local-registry':

        # Name of the container in Docker Hub

        image => 'registry',

        # We are mapping a port from the Docker host to the container.

        # If you don't do that you cannot have access

        # to the services available in the container

        ports           => ['5000:5000'],

        # We send the configuration parameters that are required to configure a insecure version of a local registry

        env             => ['SETTINGS_FLAVOR=dev', 'STORAGE_PATH=/var/docker-registry/local-registry'],

        # Containers are stateless. If you modify the filesystem

        # you are creating a new container.

        # If we want to push containers, we need a

        # persistent layer somewhere.

        # For this case, in order to have a persistent layer,

        # we are mapping a folder in the host with a folder in the container

        volumes         => ['/var/docker-registry:/var/docker-registry'],

    }

}

Please note that this installs an insecure Docker registry for testing purposes only.

 

Fat Containers Approach

For this example, I am using a fat container as I am considering the development environment for the developers within my organization. How fat containers works is very similar to virtual machines, and the learning curve will be close to zero. If the developers are already using Linux, using containers will remove the overhead of the hypervisor and their computer will be faster immediately.

This fat container will contain the following services:

  • Provided by the base image:
    • init
    • syslog
    • crontab
    • ssh
  • Provided by Puppet:
    • mysql
    • apache2 (along with Wordpress codebase)

Dockerfile will create the container Wordpress Fat Container. This is the content:

FROM phusion/baseimage

MAINTAINER Ruben Rubio Rey  "ruben.rubio@manageacloud.com"

# Activate AU mirrors

COPY files/sources.list.au /etc/apt/sources.list

# Install puppet client using Puppet Enterprise

RUN curl -k https://puppet.manageacloud.com.au:8140/packages/current/install.bash | bash

# Configure puppet client (Just removed the last line for the "certname")

COPY files/puppet.conf /etc/puppetlabs/puppet/puppet.conf

# Apply puppet changes. Note certname, we are using "wordpress-image-"

# and three random characters.

#  - "wordpress-image-" allows Puppet Enterprise

# to identify which classes must be applied

#  - The three random characters are used to

# avoid conflict with the node certificates

RUN puppet agent --debug --verbose --no-daemonize --onetime --certname wordpress-image-`date +%s | sha256sum | head -c 3; echo `

# Enable SSH - As this is meant to be a development environment,

# SSH might be useful to the developer

# This is needed for phusion/baseimage only

RUN rm -f /etc/service/sshd/down

# Change root password - even if we use key authentication

# knowing the root's password is useful for developers

RUN echo "root:mypassword" | chpasswd

# We enable the services that puppet is installing

COPY files/init /etc/my_init.d/10_init_services

RUN chmod +x /etc/my_init.d/10_init_services

When we are building the Docker container, it will request the configuration from the Puppet Master using the certname "wordpress-image-XXX" being XXX random characters.

Puppet master returns the following manifest:

class wordpress-all-in-one {

  # Problems using official mysql from Puppet Forge

  # If you try to install mysql using package {"mysql": ensure => installed }

  # it crashes. It tries to do something with the init process

  # and this container does not have a

  # fully featured init process. "mysql-noinit" installs

  # mysql without any init dependency.

  # note that although we cannot use mysql Puppet Forge

  # module to install the software, we can use

  # the types to create database, create user

  # and grant permissions

  include "mysql-noinit"

  # Fix unsatisfied requirements in Wordpress class.

  # hunner/wordpress module assumes that

  # wget is installed in the system. However,

  # containers by default has minimal software

  # installed.

  package {"wget": ensure => latest}

  # hunner/wordpress,

  # removing any task related with

  # the database (it will crash when

  # checking if mysql package is installed)

  class { 'wordpress':

    install_dir => '/var/www/wordpress',

    db_user     => 'wp_user',

    db_password => 'password',

    create_db   => false,

    create_db_user => false

  }

  # Ad-hoc apache configuration

  # installs apache, php and adds the

  # virtual server wordpress.conf

  include "apache-wordpress"

}

Build the container image:

docker build -t puppet_wordpress_all_in_one /path/to/Dockerfile_folder/



Push the image to the registry

docker tag puppet_wordpress_all_in_one registry.manageacloud.com.au:5000/puppet_wordpress_all_in_one docker push registry.manageacloud.com.au:5000/puppet_wordpress_all_in_one

Orchestrate the container

To orchestrate the fat container in a Docker host:

class container-wordpress-all-in-one {

    class { 'docker':

        extra_parameters=> ['--insecure-registry registry.manageacloud.com.au:5000']

    }

    docker::run { 'wordpress-all-in-one':



        # image is fetched from the Registry

        image => 'registry.manageacloud.com.au:5000/puppet_wordpress_all_in_one',



        # The fat container is mapping the port 80 from the docker host to

        # the container's port 80

        ports => ['80:80'],

    }

}

Microservices Approach

Now we are going to use as much as possible of the existing code using the Microservices Architecture approach. For this approach we will have two containers, a DB container running MySQL and a WEB container running Apache2.

 

1 - MySQL (DB) Microservice Container

As usual, we use the Dockerfile to build the Docker image.

Dockerfiles are very similar. I will highlight the changes.

# This time we are using the Docker Official image Ubuntu (no init process)

FROM ubuntu

MAINTAINER Ruben Rubio Rey "ruben.rubio@manageacloud.com"

# Activate AU mirrors

COPY files/sources.list.au /etc/apt/sources.list

# This base image does not have curl installed

RUN apt-get update && apt-get install -y curl

# Install puppet client

RUN curl -k https://puppet.manageacloud.com.au:8140/packages/current/install.bash | bash

# Configure puppet client

COPY files/puppet.conf /etc/puppetlabs/puppet/puppet.conf

# Apply puppet changes. We change the certname

# so Puppet Master knows what configuration to retrieve.

RUN puppet agent --debug --verbose --no-daemonize --onetime --certname ms-mysql-image-`date +%s | sha256sum | head -c 3; echo `

# Expose MySQL to Docker network

# We are notifying the Docker network that there is a container


# that has a service and other containers might need it

EXPOSE 3306

The class returned by Puppet Master is wordpress-ms-mysql. You will notice that this class is exactly the same as the fat container, but anything that is not related to the database is commented out.

class wordpress-mysql-ms {

    # Install MySQL

    include "mysql-noinit"

    # Unsatisfied requirements in wordpress class

    # package {"wget": ensure => latest}

    # Puppet forge wordpress class, removing mysql

    # class { 'wordpress':

    #   install_dir => '/var/www/wordpress',

    #   db_user => 'wp_user',

    #   db_password => 'password',

    #}

    # Apache configuration not needed

    # include "apache-wordpress"

}

Build the container

docker build -t puppet_ms_mysql .

Push the container to the registry

docker tag puppet_ms_mysql registry.manageacloud.com.au:5000/puppet_ms_mysql sudo docker push registry.manageacloud.com.au:5000/puppet_ms_mysql

 

2 - Apache (WEB) Microservice Container

Once more, we use the Dockerfile to build the image. The file is exactly the same as the MySQL, except for a few lines that are highlighted.

FROM ubuntu

MAINTAINER Ruben Rubio Rey "ruben.rubio@manageacloud.com"

# Activate AU mirrors

COPY files/sources.list.au /etc/apt/sources.list

# Install CURL

RUN apt-get update && apt-get install -y curl

# Install puppet client

RUN curl -k https://puppet.manageacloud.com.au:8140/packages/current/install.bash | bash

# Configure puppet client

COPY files/puppet.conf /etc/puppetlabs/puppet/puppet.conf

# Apply puppet changes

RUN puppet agent --debug --verbose --no-daemonize --onetime --certname ms-apache-image-`date +%s | sha256sum | head -c 3; echo `

# Apply patch to link container.

# We have to tell Wordpress where

# mysql service is running,

# using a system environment variable

# (Explanation in the next section)


# If we are using Puppet for microservices

# we should update the Wordpress module

# to set this environment variable.

# In this case, I am exposing the changes so

# it is easier to see what is changing.


RUN apt-get install patch -y

COPY files/wp-config.patch /var/www/wordpress/wp-config.patch


RUN cd /var/www/wordpress && patch wp-config.php < wp-config.patch

# We configure PHP to read system environment variables

COPY files/90-env.ini /etc/php5/apache2/conf.d/90-env.ini

The class returned by Puppet Master is wordpress-apache-ms. You will notice that it is very similar to wordpress-ms-mysql and to the one used by the fat container wordpress-all-in-one. The difference is that everything related with mysql is commented out and everything related with wordpress and apache is executed.

class wordpress-apache-ms {

    # MySQL won't be installed here

    # include "mysql-noinit"



    # Unsatisfied requirements in wordpress class

    package {"wget": ensure => latest}

    # Puppet forge wordpress class, removing mysql

    class { 'wordpress':

        install_dir => '/var/www/wordpress',

        db_user => 'wp_user',

        db_password => 'password',

        create_db => false,

        create_db_user => false

    }

    # Ad-hoc apache configuration

    include "apache-wordpress"

}

 

3 - Orchestrating Web and DB Microservice

The Puppet class that orchestrates both microservies is called container-wordpress-ms:

class container-wordpress-ms {

    # Make sure that Docker is installed

    # and that it can get images from our insecure registry

    class { 'docker':

        extra_parameters=> ['--insecure-registry registry.manageacloud.com.au:5000']

    }

    # Container DB will run MySQL

    docker::run { 'db':

        # The image is taken from the registry

        image => 'registry.manageacloud.com.au:5000/puppet_ms_mysql',

        command => '/usr/sbin/mysqld --bind-address=0.0.0.0',

        use_name => true

    }

    # Container WEB will run Apache

    docker::run { 'web':

        # The image is taken from the Registry

        image => 'registry.manageacloud.com.au:5000/puppet_ms_apache',

        command => '/usr/sbin/apache2ctl -D FOREGROUND',

        # We are mapping a port between the Docker Host and the Apache container.

        ports => ['80:80'],

        # We link WEB container to DB container. This will allow WEB to access to the

        # services exposed under DB container (in this case 3306)

        links => ['db:db'],

        use_name => true,

       # We need DB container up and running before running WEB.

        depends => ['db'],

    }

}

 

APPENDIX I: Linking containers

When we are linking containers in the microservices approach we are are performing the following tasks

 

Starting "db" container:

This will start puppet_ms_mysql, named as db container. Please note that puppet_ms_mysql is exposing the port 3306, which notifies Docker that this container has a service that might be useful for other containers.

docker run --name db -d puppet_ms_mysql /usr/sbin/mysqld --bind-address=0.0.0.0

 

Starting "web" container

Now we want to start the container puppet_ms_apache, named as web .

If we link the containers and execute the command env the folllowing environment variables are created in the web container:

docker run --name web -p 1800:80 --link db:db puppet_ms_apache env PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=8d48e28094e3 DB_PORT=tcp://172.17.0.2:3306 DB_PORT_3306_TCP=tcp://172.17.0.2:3306 DB_PORT_3306_TCP_ADDR=172.17.0.2 DB_PORT_3306_TCP_PORT=3306 DB_PORT_3306_TCP_PROTO=tcp DB_NAME=/web/db HOME=/root

These variables point out where the mysql database is. Thus, the application should use the environment variable DB_PORT_3306_TCP_ADDR to connect to the database.

  • DB is the name of the container we are linking to
  • 3306 is the port exposed in the Dockerfile of the db container

 

APPENDIX II: Docker Compose

When working with microservices, you want to avoid long commands. Docker Compose makes the management of long Docker commands a lot easier. For example, this is how the Microservices approach would look with Docker Compose:

file docker-compose.yml

web:

  image: puppet_ms_apache

  command: /usr/sbin/apache2ctl -D FOREGROUND

  links:

   - db:db

  ports:

   - "80:80"

db:

  image: puppet_ms_mysql

  command: /usr/sbin/mysqld --bind-address=0.0.0.0

 

and you can execute both contianers with the command docker-compose up

April 20, 2015

VMware Ships rkt and Supports App Container Spec

Today VMware shipped rkt, the application container runtime, and made it available to VMware customers in Project Photon. VMware also announced their support of the App Container spec, of which rkt is the first implementation.

“VMware is happy to provide rkt to offer our customers application container choice. rkt is the first implementation of the App Container spec (appc), and we look forward to contributing to the appc community to advance security and portability between platforms.”

— Kit Colbert, vice president and CTO, Cloud-Native Apps, VMware

We are thrilled to welcome VMware into the appc and rkt communities. The appc specific was formed to create an industry standard of how applications should be deployed in containers, with a focus on portability, composability, and security. rkt is a project originated by CoreOS to provide a production-ready Linux implementation of the specification.

VMware's extensive experience with running applications at scale in enterprise environments will be incredibly valuable as we work together with the community towards a 1.0 release of the appc specification and the rkt project.

Join us on our mission to create a secure, composable and standards-based container runtime. We welcome your involvement and contributions to rkt and appc:

April 16, 2015

etcd 2.0 in CoreOS Alpha Image

Today we are pleased to announce that the first CoreOS image to have an etcd v2.0 release is now available in CoreOS alpha channel. etcd v2.0 marks a milestone in the evolution of etcd and includes many new features and improvements over etcd 0.4 including:

  • Reconfiguration protocol improvements: guards against accidental misconfiguration
  • New raft implementation: provides improved cluster stability
  • On-disk safety improvements: utilizes CRC checksums and append-only log behavior

etcd is an open source, distributed, consistent key-value store. It is a core component of CoreOS software that facilitates safe automatic updates, coordinates work scheduled to hosts, and sets up overlay networking for containers. Check out the etcd v2.0 announcement for more details on etcd and the new features.

We’ve been using etcd v2.0 in production behind discovery.etcd.io and quay.io for a few months now and it has proven to be stable in these use cases. All existing applications that use the etcd API should work against this new version of etcd. We have tested etcd v2.0 with applications like fleet, locksmith and flannel. The user facing API to etcd should provide the same features it had in the past; if you find issues please report them on GitHub.

Setup Using cloud-init

If you want to dive right in and try out bootstrapping a new cluster, the cloud-init docs have full details on all of the parameters. To support the new features of etcd v2.0, such as multiple listen addresses and proxy modes, a new cloud-init section named etcd2 is used. With a few lines of configuration and a new discovery token, you can take etcd v2.0 for a spin on your cluster.

IANA Ports

With the release of etcd2, we’ve taken the opportunity to begin the transition to our IANA-assigned port numbers: 2379 and 2380. For backward compatibility, etcd2 is configured to listen on both the new and old port numbers (4001 and 7001) by default, but this can always be further restricted as desired.

Migration and Changes

Existing clusters running etcd 0.4 clusters will not automatically migrate to etcd v2.0. As there are semantic changes in how etcd clusters are managed between the two versions, we have decided to include both. There are documented methods to migrate to etcd v2.0 and you may do this at your own pace. We encourage users to use etcd v2.0 for all new clusters to take advantage of the large number of stability and performance improvements over the older series.

In this process, we have had to break backward compatibility in two cases in order to support this change:

  1. Starting fleet.service without explicitly starting etcd.service or etcd2.service will no longer work. If you are using fleet and need a local etcd endpoint, you will need to also start etcd.service or etcd2.service.

  2. Starting flannel.service without explicitly starting etcd.service or etcd2.service will no longer work. If you are using flannel and need a local etcd endpoint, you will need to also start etcd.service or etcd2.service.

We have discouraged the use of this implicit dependency via our documentation but you can check if you will be affected. Make sure that etcd.service or etcd2.service are enabled or started in your cloud-config.

Looking Forward

As we look forward to etcd v2.1.0 and beyond, there are a number of exciting things shaping up inside of etcd. In the near future new features such as the authorization and authentication API will make it safer to operate multiple applications on a single cluster. The team has also been operating both on-going test environments that introduce regular partitions and crashes and making practical benchmarks available. In the last few days there has also been an active discussion on how to evolve the etcd APIs to better support the applications using etcd for coordination and scheduling today.

We welcome your involvement in the development of etcd - via the etcd-dev discussion mailing list, GitHub issues, or contributing directly to the project.

April 14, 2015

CoreOS on ARM64

This is a guest post from CoreOS contributor, Geoff Levand, Linux Architect, Huawei America Software Lab. He has started work on an ARM64 port of CoreOS. Here is the current state of the project, followed by how you can help.

Recent patches that I've contributed to CoreOS have added basic support for a new target board named arm64-usr. There is currently a single generic ARM64 little endian Linux profile. This profile should work with any ARM64 platform currently supported by the mainline Linux kernel, so the ARM V8 Foundation Model, the ARM FVP_VE Fast Model, the ARM FVP_BASE Fast Model, and recent qemu-system-aarch64. I hope to add other profiles to support an ARM64 big endian build, and also to get the eight-core HiSilicon 6220 based HiKey developer board supported.

ARM64 porting work is still in progress, so please consider what is done so far as experimental. Some initial work I did along with Michael Marineau of CoreOS was to clean up parts of the CoreOS build system to simplify the way architectures are defined, and also to make the generic build infrastructure completely architecture agnostic. The resulting system should make it quite straight forward to add additional architecture support to CoreOS.

The ARM64 architecture is a relatively new one, so many upstream software packages have either only recently been updated to support ARM64, or have not yet been. Much of my CoreOS porting work so far has been going through the packages which don't build and figuring out how to get them to build. Sometimes a package can be updated to the latest upstream, sometimes a package keyword can be set, sometimes a modification to the ebuild in coreos-overlay will work, and other times a combination of these are needed. This process is still ongoing, and some difficult packages still lay ahead. The resulting arm64-usr build is experimental and all the work to bring it up will need testing and review in the future.

There is still a lot of work to be done. Many more packages need to be brought up, and as I mentioned, this involves working at a low level with the package ebuild files and the CoreOS build system. At another level, all the CoreOS features will need to be exercised and verified as needed to bring up the stability and confidence of the port. There are going to be multi-arch clusters, so ARM64 and x86_64 nodes are going to need to work together -- it sounds pretty cool. Someone will need to get in there and make that happen. If you have any interest in the ARM64 port I encourage you get involve and help out.

For general info about the port you can look at my Github site. For those who would like to investigate more, or even help with the effort, see my CoreOS ARM64 HOWTO document.

Continue the discussion with Geoff at CoreOS Fest and on freenode in #coreos as geoff-

April 13, 2015

Counting Down to CoreOS Fest on May 4 and 5

As we count down to the inaugural CoreOS Fest in just three weeks, we are thrilled to announce additional speakers and the agenda! CoreOS Fest will be May 4-5 at The Village at 969 Market Street in San Francisco and we hope you will join us.

CoreOS Fest is a two-day event about the tools and best practices used to build modern infrastructure stacks. CoreOS Fest connects people from all levels of the community with future-thinking industry veterans to learn how to build distributed systems that support application containers. This May’s festival is brought to you by our premier sponsor Intel, and additional sponsors Sysdig, Chef, Mesosphere, Metaswitch Networks and Giant Swarm.

CoreOS Fest will include speakers from Google, Intel, Salesforce Data.com, HP, and more, including:

  • Brendan Burns, software engineer at Google and founder of Kubernetes, will provide a technical overview of Kubernetes

  • Diego Ongaro, creator of Raft, will discuss the Raft Consensus Algorithm

  • Lennart Poettering, creator of systemd, will talk about systemd at the Core of the OS

  • Nicholas Weaver, director of SDI-X at Intel, will demonstrate how we can optimize container architectures for the next level of scale

  • Prakash Rudraraju, manager of technical operations at Salesforce Data.com, will join Brian Harrington, principal architect at CoreOS, for a fireside chat on how Salesforce Data.com is thinking about distributed systems and application containers

  • Yazz Atlas, HPCS principle engineer with Hewlett-Packard Advanced Technology Group, will give a presentation on automated MySQL Cluster Failover using Galera Cluster on CoreOS Linux

  • Loris Degioanni, CEO and founder of Sysdig and co-creator of Wireshark, will present the dark art of container monitoring

  • Gabriel Monroy, CTO at OpDemand/Deis, will discuss lessons learned from building platforms on top of CoreOS

  • Spencer Kimball, founder of Cockroach Labs, will talk about CockroachDB

  • Chris Winslett, product manager at Compose.io, will present etcd based Postgres SQL HA Cluster

  • Timo Derstappen, co-founder of Giant Swarm, will present Containers on the Autobahn

More speakers will be added at https://coreos.com/fest/.

As a part of today's schedule announcement, we are offering 10 percent off the regular ticket price until tomorrow, April 14, at 10 a.m. PT. Use this link to reserve your 10 percent off ticket. Tickets are selling fast so get them before we sell out!

Once again, CoreOS Fest thanks its top level sponsor Intel and additional sponsors, including Sysdig, Chef, Mesosphere, Metaswitch Networks and Giant Swarm. If you’re interested in participating at CoreOS Fest as a sponsor, contact fest@coreos.com.

For more CoreOS Fest news, follow along @coreoslinux or #CoreOSFest

April 08, 2015

Upcoming CoreOS Events in April

Supplied with fresh CoreOS t-shirts and half our weight in airport Cinnabons, we’ve made sure that you’ll be seeing a lot of us this April.


Wednesday, April 8, 2015 at 10:15 a.m. EDT - Philadelphia, PA

Don’t miss Kelsey Hightower (@kelseyhightower), developer advocate and toolsmith at CoreOS, kick off our April events by speaking at ETE Conference. He’ll be discussing managing containers at scale with CoreOS and Kubernetes.


Thursday, April 16, 2015 at 7:00 p.m. CET - Amsterdam, Netherlands

Kelsey Hightower will be giving an introduction to fleet, CoreOS and building large reliable systems at the Docker Randstad Meetup.


Thursday, April 16, 2015 at 6:00 p.m. PDT - San Francisco, CA

Brian Harrington will be giving an overview of CoreOS at CloudCamp. This is an unconference dedicated to all things containers.


Friday, April 17, 2015 - San Francisco, CA

Joined by a few of our very own, CoreOS CTO Brandon Philips (@BrandonPhilips) will be speaking at Container Camp. This event focuses on the latest developments in software virtualization. Get your tickets here.


Tuesday April 21 - Saturday, April 25, 2015 - Berlin, Germany

This year we’ll be attending Open Source Data Center Conference (OSDC) where Kelsey Hightower will be talking on building distributed systems with CoreOS.


Wednesday, April 22 at 6:30p.m. CET - Berlin, Germany

If you’re in Berlin, be sure to check out Kelsey Hightower talk about managing containers at scale with CoreOS and Kubernetes.


In case you missed it

In case you missed it, check out Chris Winslett from Compose.io talk about an etcd-based PostgreSQL HA Cluster:

CoreOS Fest

Don’t forget that CoreOS Fest is happening the following month on May 4 and 5! We’ve released a tentative schedule and our first round of speakers. Keep checking back for more updates as the event gets closer.

April 07, 2015

Sahana Participates for GCI 2014

The Sahana Software Foundation has actively taken part in the Google Code-In programme since its inception in 2010 and 2014′s programme was no exception as Sahana was once again among the 12 open source organizations selected to mentor students for Code-In. [Read the Rest...]

April 06, 2015

Announcing Tectonic: The Commercial Kubernetes Platform

CoreOS Tech Stack + Kubernetes

Our technology is often characterized as “Google’s infrastructure for everyone else.” Today we are excited to make this idea a reality by announcing Tectonic, a commercial Kubernetes platform. Tectonic provides the combined power of the CoreOS portfolio and the Kubernetes project to any cloud or on-premise environment.

Why we are building Tectonic

Our users want to securely run containers at scale in a distributed environment. We help companies do this by building open source tools which allow teams to create this type of infrastructure. With Tectonic, we now have an option for companies that want a preassembled and enterprise-ready distribution of these tools, allowing them to quickly see the benefits of modern container infrastructure.

What is Tectonic?

Tectonic is a platform combining Kubernetes and the CoreOS stack. Tectonic pre-packages all of the components required to build Google-style infrastructure and adds additional commercial features, such as a management console for workflows and dashboards, an integrated registry to build and share Linux containers, and additional tools to automate deployment and customize rolling updates.

Tectonic is available today to a select number of early customers. Head over to tectonic.com to sign up for the waitlist if your company is interested in participating.

What is Kubernetes?

Kubernetes is an open source project introduced by Google to help organizations run their infrastructure in a similar manner to the internal infrastructure that runs Google Search, Gmail, and other Google services. The concepts and workflows in Kubernetes are designed to help engineers focus on their application instead of infrastructure and build for high availability of services. With the Kubernetes APIs, users can manage application infrastructure - such as load balancing, service discovery, and rollout of new versions - in a way that is consistent and fault-tolerant.

Tectonic and CoreOS

Tectonic is a commercial product, and with this release, we have decided to launch our commercial products under a new brand, separate from the CoreOS name. We want our open source components - like etcd, rkt, flannel, and CoreOS Linux - to always be freely available for everyone under their respective open source licenses. We think open source development works best when it is community-supported infrastructure that we all share and build with few direct commercial motives. To that end, we want to keep CoreOS focused on building completely open source components.

To get access to an early release of Tectonic or to learn more, visit tectonic.com. To contribute and learn more about our open source projects visit coreos.com.

Google Ventures Funding

In addition to introducing Tectonic, today we are announcing an investment in CoreOS, Inc. led by Google Ventures. It is great to have the support and backing of Google Ventures as we bring the Kubernetes platform to market. The investment will help us accelerate our efforts to secure the backend of the Internet and deliver Google-like infrastructure to everyone else.

FAQ

Q: What does this change about CoreOS Linux and other open source projects like rkt, etcd, fleet, flannel, etc?

A: Nothing: development will continue, and we want to see all of the open source projects continue to thrive as independent components. CoreOS Linux will remain the same carefully maintained, open source, and container-focused OS it has always been. Tectonic uses many of these projects internally - including rkt, etcd, flannel, and fleet - and runs on top of the same CoreOS Linux operating system as any other application would.

Q: I am using Apache Mesos, Deis, or another application on top of CoreOS Linux: does anything change for me?

A: No, this announcement doesn't change anything about the CoreOS Linux project or software. Tectonic is simply another container-delivered application that runs on top of CoreOS Linux.

Q: What does this change for existing Enterprise Registry, Managed Linux, or Quay.io customers?

A: Everything will remain the same for existing customers. All of these components are utilized in the Tectonic stack and we continue to offer support, fix bugs and add features to these products.


Follow @TectonicStack on Twitter

Go to Tectonic.com to join an early release or to stay up to date on Tectonic news

Visit us in person at CoreOS Fest in San Francisco May 4-5, to learn more about CoreOS, Tectonic and all things distributed systems

April 01, 2015

Announcing rkt v0.5, featuring pods, overlayfs, and more

rkt is a new container runtime for applications, intended to meet the most demanding production requirements of security, efficiency and composability. rkt is also an implementation of the emerging Application Container (appc) specification, an open specification defining how applications can be run in containers. Today we are announcing the next major release of rkt, v0.5, with a number of new features that bring us closer to these goals, and want to give an update on the upcoming roadmap for the rkt project.

appc v0.5 - introducing pods

This release of rkt updates to the latest version of the appc spec, which introduces pods. Pods encapsulate a group of Application Container Images and describe their runtime environment, serving as a first-class unit for application container execution.

Pods are a concept recently popularised by Google's Kubernetes project. The idea emerged from the recognition of a powerful, pervasive pattern in deploying applications in containers, particularly at scale. The key insight is that, while one of the main value propositions of containers is for applications to run in isolated and self-contained environments, it is often useful to co-locate certain "helper" applications within a container. These applications have an intimate knowledge of each other - they are designed and developed to work co-operatively - and hence can share the container environment without conflict, yet still be isolated from interfering with other application containers on the same system.

A classic example of a pod is service discovery using the sidekick model, wherein the main application process serves traffic, and the sidekick process uses its knowledge of the pod environment to register the application in the discovery service. The pod links together the lifecycle of the two processes and ensures they can be jointly deployed and constrained in the cluster.

Another simple example is a database co-located with a backup worker. In this case, the backup worker could be isolated from interfering with the database's work - through memory, I/O and CPU limits applied to the process - but when the database process is shut down the backup process will terminate too. By making the backup worker an independent application container, and making pods the unit of deployment, we can reuse the worker for backing up data from a variety of applications: SQL databases, file stores or simple log files.

This is the power that pods provide: they encapsulate a self-contained, deployable unit that still provides granularity (for example, per-process isolators) and facilitates advanced use cases. Bringing pods to rkt enables it to natively model a huge variety of application use cases, and integrate tightly with cluster-level orchestration systems like Kubernetes.

For more information on pods, including the technical definition, check out the appc spec or the Kubernetes documentation.

overlayfs support

On modern Linux systems, rkt now uses overlayfs by default when running application containers. This provides immense benefits to performance and efficiency: start times for large containers will be much faster, and multiple pods using the same images will consume less disk space and can share page cache entries.

If overlayfs is not supported on the host operating system, rkt gracefully degrades back to the previous behaviour of extracting each image at runtime - this behaviour can also be triggered with the new --no-overlay flag to rkt run.

Another improvement behind the scenes is the introduction of a tree cache for rkt's local image storage. When storing ACIs in its local database (for example, after pulling them from a remote repository using rkt fetch), rkt will now store the expanded root filesystem of the image on disk. This means that when pods that reference this image are subsequently started (via rkt run), the pod filesystem can be created almost instantaneously in the case of overlayfs - or, without overlayfs, by using a simple copy instead of needing to expand the image again from its compressed format.

To facilitate simultaneous use of the tree store by multiple rkt invocations, file-based locking has been added to ensure images that are in use cannot be removed. Future versions of rkt will expose more advanced capabilities to manage images in the store.

stage1 from source

When executing application containers, rkt uses a modular approach (described in the architecture documentation) to support swappable, alternative execution environments. The default stage1 that we develop with rkt itself is based on systemd, but alternative implementations can leverage different technologies like KVM-based virtual machines to execute applications.

In earlier versions of rkt, the pre-bundled stage1 was assembled from a copy of the CoreOS Linux distribution image. We have been working hard to decouple this process to make it easier to package rkt for different operating systems and in different build environments. In rkt 0.5, the default stage1 is now constructed from source code, and over the next few releases we will make it easier to build alternative stage1 images by documenting and stabilizing the ABI.

"Rocket", "rocket", "rkt"?

This release also sees us standardizing on a single name for all areas of the project - the command-line tool, filesystem names and Unix groups, and the title of the project itself. Instead of "rocket", "Rocket", or "rock't", we now simply use "rkt".

rkt logo

Looking forward

rkt is a young project and the last few months have seen rapid changes to the codebase. As we look towards rkt 0.6 and beyond, we will be focusing on making it possible to depend on rkt to roll-forward from version to version without breaking working setups. There are several areas that are needed to make this happen, including reaching the initial stable version (1.0) of the appc spec, implementing functional testing, stabilizing the on-disk formats, and implementing schema upgrades for the store. We realize that stability is vital for people considering using rkt in production environments, and this will be a priority in the next few releases. The goal is to make it possible for a user that was happily using rkt 0.6 to upgrade to rkt 0.7 without having to remove their downloaded ACIs or configuration files.

We welcome your involvement in the development of rkt - via the rkt-dev discussion mailing list, GitHub issues, or contributing directly to the project.

March 27, 2015

CoreOS Fest 2015 First Round of Speakers Announced

As you might already know, we’re launching our first ever CoreOS Fest this May 4th and 5th in San Francisco! We’ve been hard at work making sure that this event is two days filled with all things distributed, and all things awesome.

In addition to many CoreOS project leads taking the stage, we are excited to announce a sneak peek at some of our community speakers. Join us at CoreOS Fest and you’ll hear from some of the most influential people in distributed systems today: Brendan Burns, one of the founders of Kubernetes; Diego Ongaro, the creator of Raft; Gabriel Monroy, the creator of Deis; Spencer Kimball, CEO of Cockroach Labs; Loris Degioanni, CEO of Sysdig; and many more!

We are still accepting submissions for speakers through March 31st, so we encourage you to submit your talk in our Call for Papers portal.

While the schedule will be live in the coming weeks, here's a high level overview:

We’ll kick off day one at 9 AM PDT (with registration and breakfast beforehand) with a single track of speakers, followed by lunch, then afternoon panels and breakouts. You’ll have lots of opportunities to connect and talk with fellow attendees, especially at an evening reception on the first day. Day two will include breakfast, single-track talks, lunch, panels and more.

Confirmed Speakers

See more about our first round of speakers:


Brendan Burns
Brendan Burns
Software Engineer at Google and a founder of the Kubernetes project

Brendan works in the Google Cloud Platform, leading engineering efforts to make the Google Cloud Platform the best place to run containers. He also has managed several other cloud teams including the Managed VMs team, and Cloud DNS. Prior to Cloud, he was a lead engineer in Google’s web search infrastructure, building backends that powered social and personal search. Prior to working at Google, he was a professor at Union College in Schenectady, NY. He received a PhD in Computer Science from the University of Massachusetts Amherst, and a BA in Computer Science and Studio Art from Williams College.


Diego Ongaro
Diego Ongaro
Creator of Raft

Diego recently completed his doctorate with John Ousterhout at Stanford. During his doctorate, he worked on RAMCloud (a 5-10 microsecond RTT key-value store), Raft, and LogCabin (a coordination service built with Raft). He’s lately been continuing development on LogCabin as an independent contractor.


Gabriel Monroy
Gabriel Monroy
CTO of OpDemand and creator of Deis

Gabriel Monroy is CTO at OpDemand and the creator of Deis, the leading CoreOS-based PaaS. As an early contributor to Docker and CoreOS, Gabriel has deep experience putting containers into production and frequently advises organizations on PaaS, container automation and distributed systems. Gabriel spoke recently at QConSF on cluster scheduling and deploying containers at scale.


Spencer Kimball
Spencer Kimball

Spencer is CEO of Cockroach Labs. After helping to re-architect and re-implement Square's items catalog service, Spencer was convinced that the industry needed a more capable database software. He began work on the design and implementation of Cockroach as an open source project and moved to work on it full time at Square mid-2014. Spencer managed the acquisition of Viewfinder by Square as CEO and before that, shared the roles of co-CTO and co-founder. Previously, he worked at Google on systems and web application infrastructure, most recently helping to build Colossus, Google’s exascale distributed file system, and on Java infrastructure, including the open-sourced Google Servlet Engine.


Loris Degioanni
Loris Degioanni
CEO of Sysdig

Loris is the creator and CEO of Sysdig, a popular open source troubleshooting tool for Linux environments. He is a pioneer in the field of network analysis through his work on WinPcap and Wireshark: open source tools with millions of users worldwide. Loris was previously a senior director of technology at Riverbed, and co-founder/CTO at CACE Technologies, the company behind Wireshark. Loris holds a PhD in computer engineering from Politecnico di Torino, Italy.


Excited? Stay tuned for more announcements and join us at CoreOS Fest 2015.

Buy your early bird ticket by March 31st: https://coreos.com/fest/

Submit a speaking abstract by March 31st: CFP Portal

Become a sponsor, email us for more details.

March 20, 2015

What makes a cluster a cluster?

“What makes a cluster a cluster?” - Ask that question of 10 different engineers and you’ll get 10 different answers. Some look at it from a hardware perspective, some see it as a particular set of cloud technologies, and some say it’s the protocols exchanging information on the network.

With this ever-growing field of distributed systems technologies, it is helpful to compare the goals, roles and differences of some of these new projects based on their functionality. In this post we propose a conceptual description of the cluster at large, while showing some examples of emerging distributed systems technologies.

Layers of abstraction

The tech community has long agreed on what a network looks like. We’ve largely come to agree, in principle, on the OSI (Open Systems Interconnection) model (and in practice, on its close cousin, the TCP/IP model).

A key aspect of this model is the separation of concerns, with well-defined responsibilities and dependence between components: every layer depends on the layer below it and provides useful network functionality (connection, retry, packetization) to the layer above it. At the top, finally, are web sessions and applications of all sorts running and abstracting communication.

So, as an exercise to try to answer “What makes a cluster a cluster?” let’s apply the same sort of thinking to layers of abstraction in terms of execution of code on a group of machines, instead of communication between these machines.

Here’s a snapshot of the OSI model, applied to containers and clustering:

OSI Applied to Clustering

Let’s take a look from the bottom up.

Level 1, Hardware

The hardware layer is where it all begins. In a modern environment, this may mean physical (bare metal) or virtualized hardware – abstraction knows no bounds – but for our purposes, we define hardware as the CPU, RAM, disk and network equipment that is rented or bought in discrete units.

Examples: bare metal, virtual machines, cloud

Level 2, OS/Machine ABI

The OS layer is where we define how software executes on the hardware: the OS gives us the Application Binary Interface (ABI) by which we agree on a common language that our userland applications speak to the OS (system calls, device drivers, and so on). We also set up a network stack so that these machines can communicate amongst each other. This layer therefore provides our lowest level complete execution environment for applications.

Now, traditionally, we stop here, and run our final application on top of this as a third pseudo-layer of the OS and various user-space packages. We provision individual machines with slightly different software stacks (a database server, an app server) and there’s our server rack.

Over the lifetime of servers and software, however, the permutations and histories of individual machine configurations start to become unwieldy. As an industry, we are learning that managing this complexity becomes costly or infeasible over time, even at moderate scale (e.g. 3+ machines).

This is often where people start to talk about containers, as containers treat the entire OS userland as one hermetic application package that can be managed as an independent unit. Because of this abstraction, we can conceptually shift containers up the stack, as long as they’re above layer 2. We’ll revisit containers in layer 6.

Examples: kernel + {systemd, cgroups/namespaces, jails, zones}

Level 3, Cluster Consensus

To begin to mitigate the complexity of managing individual servers, we need to start thinking about machines in some greater, collective sense: this is our first notion of a cluster. We want to write software that scales across these individual servers and shares work effortlessly.

However, as we add more servers to the picture, we now introduce many more points of failure: networks partition, machines crash and disks fail. How can we build systems in the face of greater uncertainty? What we’d like is some way of creating a uniform set of data and data primitives, as needed by distributed systems. Much like in multiprocessor programming, we need the equivalent of locks, message passing, shared memory and atomicity across this group of machines.

This is an interesting and vibrant field of algorithmic research: a first stop for the curious reader should be the works of Leslie Lamport, particularly his earlier writing on ordering and reliability of distributed systems. His later work describes Paxos, the preeminent consensus protocol; the other major protocol, as provided by many projects in this category, is Raft.

Why is this called consensus? The machines need to ‘agree’ on the same history and order of events in order to make the guarantees we’d like for the primitives described. Locks cannot be taken twice, for example, even if some subset of messages disappears or arrives out of order, or member machines crash for unknown reasons.

These algorithms build data structures to form a coherent, consistent, and fault-tolerant whole.

Examples: etcd, ZooKeeper, consul

Level 4, Cluster Resources

With this perspective of a unified cluster, we can now talk about cluster resources. Having abstracted the primitives of individual machines, we use this higher level view to create and interact with the complete set of resources that we have at our disposal. Thus we can consider in aggregate the CPUs, RAM, disk and networking as available to any process in the cluster, as provided by the physical layers underneath.

Viewing the cluster as one large machine, all devices (CPU, RAM, disk, networking) become abstract. This is a benefit already being used by containers. Containers depend on these things being abstracted on their behalf; for example, network bridges. This is so they can use these abstractions at a level higher in the stack while running on any of the underlying hardware.

In some sense, this layer is the equivalent of the hardware layer of the now-primordial notion of the cluster. It may not be as celebrated as the layers above it, but this layer is where some important innovation takes place. Showing a cool auto-scaling webapp demo is nice, but requires things like carving up cluster IP space or where a block device is attached to a host.

Examples: flannel, remote block storage, weave

Level 5, Cluster Orchestration and Scheduling

Cluster orchestration, then, starts to look a lot like an OS kernel atop these cluster-level resources and the tools given by consistency – symmetry with the layers below again. It’s the purview of the orchestration platform to divide and share cluster resources, schedule applications to run, manage permissions, set up interfaces into and out of the cluster, and at the end of the day, find an ABI-compatible environment for the userland. With increased scale comes new challenges: from finding the right machines to providing the best experience to users of the cluster.

Any software that will run on the cluster must ultimately execute on a physical CPU on a particular server. How the application code gets there and what abstractions it sees is controlled by the orchestration layer. This is similar to how WiFi simulates a copper wire to existing network stacks, with a controllable abstraction through access points, signal strength, meshes, encryption and more.

Examples: fleet, Mesos, Kubernetes

Level 6, Containers

This brings us back to containers, which, as described earlier, the entire userland is bundled together and treated as a single application unit.

If you’ve followed the whole stack up to this point, you’ll see why containers sit at level 6, instead of at level 2 or 3. It’s because the layers of abstraction below this point all depend on each other to build up to the point where a single-serving userland can safely abstract whether it’s running as one process on a local machine or as something scheduled on the cluster as a whole.

Containers are actually simple that way; they depend on everything else to provide the appropriate execution environment. They carry userland data and expect specific OS details to be presented to them.

Examples: Rocket, Docker, systemd-nspawn

Level 7, Application

Containers are currently getting a lot of attention in the industry because they can separate the OS and software dependencies from the hardware. By abstracting these details, we can create consistent execution environments across a fleet of machines and let the traditional POSIX userland continue to work, fairly seamlessly, no matter where you take it. If the intention is to share the containers, then choice is important, as is agreeing upon a sharable standard. Containers are exciting; it starts us down the road of a lot of open source work in the realm of true distributed systems, backwards-compatible with the code we already write – our Application.

Closing Thoughts

For any of the layers of the cluster, there are (and will continue to be) multiple implementations. Some will combine layers, some will break them into sub-pieces – but this was true of networking in the past as well (do you remember IPX? Or AppleTalk?).

As we continue to work deeply on the internals of every layer, we also sometimes want to take a step back to look at the overall picture and consider the greater audience of people who are interested and starting to work on clusters of their own. We want to introduce this concept as a guideline, with a symmetric way of thinking about a cluster and its components. We’d love your thoughts on what defines a cluster as more than a mass of hardware.

March 13, 2015

Announcing rkt and App Container 0.4.1

Today we are announcing rkt v0.4.1. rkt is a new app container runtime and implementation of the App Container (appc) spec. This milestone release includes new features like private networking, an enhanced container lifecycle, and unprivileged image fetching, all of which get us closer to our goals of a production-ready container runtime that is composable, secure, and fast.

Private Networking

This release includes our first iteration of the rkt networking subsystem. As an example, let's run etcd in a private network:

# Run an etcd container in a private network
$ rkt run --private-net coreos.com/etcd:v2.0.4

By using the --private-net flag, the etcd container will run with its own network stack decoupled from the host. This includes a private lo loopback device and an eth0 device with an IP in the 172.16.28.0/24 address range. By default, rkt creates a veth pair, with one end becoming eth0 in the container and the other placed on the host. rkt will also set up an IP masquerade rule (NAT) to allow the container to speak to the outside world.

This can be demonstrated by being able to reach etcd on its version endpoint from the host:

$ curl 172.16.28.9:2379/version
{"releaseVersion":"2.0.4","internalVersion":"2"}

The networking configuration in rkt is designed to be highly pluggable to facilitate a variety of networking topologies and infrastructures. In this release, we have included plugins for veth, bridge, and macvlan, and more are under active development. See the rkt network docs for details.

If you are interested in building new network plugins, please take a look at the current specification and get involved by reaching out on GitHub or the mailing list. We would also like to extend a thank you to everyone who has spent time giving valuable feedback on the spec so far.

Unprivileged Fetches

It is good practice to download files over the Internet only as unprivileged users. With this release of rkt, it is possible to set up a rkt Unix group, and give users in that group the ability to download and verify container images. For example, let's give the core user permission to use rkt to retrieve images and verify their signature:

$ sudo groupadd rkt
$ sudo usermod -a -G rkt core
$ sudo rkt install
$ rkt fetch coreos.com/etcd:v2.0.5
rkt: searching for app image coreos.com/etcd:v2.0.5
rkt: fetching image from https://github.com/coreos/etcd/releases/download/v2.0.5/etcd-v2.0.5-linux-amd64.aci
Downloading ACI: [==========                                   ] 897 KB/3.76 MB
Downloading signature from https://github.com/coreos/etcd/releases/download/v2.0.5/etcd-v2.0.5-linux-amd64.aci.asc
rkt: signature verified:                                       ] 0 B/819 B
  CoreOS ACI Builder <release@coreos.com>
sha512-295a78d35f7ac5cc919e349837afca6d

The new rkt install subcommand is a simple helper to quickly set up all of the rkt directory permissions. These steps could easily be scripted outside of rkt for a more complex setup or a custom group name; for example, distributions that package rkt in their native formats would configure directory permissions at the time the package is installed.

Note that the image we’ve fetched will still need to be run with sudo, as Linux doesn't yet make it possible to do many of the operations necessary to start a container without root privileges. But at this stage, you can trust that the image comes from an author you have already trusted via rkt trust.

Other Features

rkt prepare is a new command that can be used to set up a container without immediately running it. This gives users the ability to allocate a container ID and do filesystem setup before launching any processes. In this way, a container can be prepared ahead of time, so that when rkt run-prepared is subsequently invoked, the process startup happens immediately with few additional steps. Being able to pre-allocate a unique container ID also facilitates better integration with higher-level orchestration systems.

rkt run can now append additional command line flags and environment variables for all apps, as well as optionally have containers inherit the environment from the parent process. For full details see the command line documentation.

The image store now uses a ql database to track metadata about images in the store. This is used to keep track of URLs, labels, and other metadata of images stored inside rkt's local store. Note that if you are upgrading from a previous rkt release on a system, you may need to remove /var/lib/rkt. We understand people are already beginning to rely on rkt and over the next few releases will focus heavily on introducing stable APIs. But until we are closer to a 1.0 release, expect that there will be more regular changes.

For more details about this 0.4.1 release and pre-compiled standalone rkt Linux binaries see the release page.

Updates to App Container spec

Finally, this change updates rkt to the latest version of the appc spec, v0.4.1. Recent changes to the spec include reworked isolators, new OS-specific requirements, and greater explicitness around image signing and encryption. You can refer to a list of some major changes and additions here.

Join us on the mission to create a secure, composable and standards based container runtime, and get involved in hacking on rkt or App Container here:

rkt:

Help Wanted, Mailing list

App Container:

Help Wanted, Mailing list

March 12, 2015

rkt Now Available in CoreOS Alpha Channel

Our CoreOS Alpha channel is designed to strike a balance between offering early access to new versions of software and serving as the release candidate for the Beta and Stable channels. Due to its release-candidate nature, we must be conservative in upgrading critical system components (e.g. systemd and etcd), but in order to get new technologies (like fleet and flannel) into the hands of users for testing we must occasionally include pre-production versions of these components in Alpha.

Today, we are adding rkt, a container runtime built on top of the App Container spec, to make it easier for users to try it and give us feedback.

rkt will join systemd-nspawn and Docker as container runtimes that are available to CoreOS users. Keep in mind that rkt is still pre-1.0 and that you should not rely on flags or the data in /var/lib/rkt to work between versions. Specifically, next week v0.4.1 will land in Alpha which is incompatible with images and containers created by previous versions of rkt. Besides the addition of /usr/bin/rkt to the image, nothing major has changed and no additional daemons will run by default.

Release Cadence

We have adopted a regular weekly schedule for Alpha releases, rolling out a new version every Thursday. Every other week we release a Beta, taking the best of the previous two Alpha versions and promoting it bit-for-bit. Similarly, once every four weeks we promote the best of the previous two Beta releases to Stable.

Give it a spin

If you want to spin up a CoreOS Alpha machine and get started, check out the documentation for v0.3.2. We look forward to having you involved in rkt development via the rkt-dev discussion mailing list, GitHub issues, or contributing directly to the project. We have made great progress so far, but there is still much to build!

Confessions of a Recovering Proprietary Programmer, Part XV

So the Linux kernel now has a Documentation/CodeOfConflict file. As one of the people who provided an Acked-by for this file, I thought I should set down what went through my mind while reading it. Taking it one piece at a time:



The Linux kernel development effort is a very personal process compared to “traditional” ways of developing software. Your code and ideas behind it will be carefully reviewed, often resulting in critique and criticism. The review will almost always require improvements to the code before it can be included in the kernel. Know that this happens because everyone involved wants to see the best possible solution for the overall success of Linux. This development process has been proven to create the most robust operating system kernel ever, and we do not want to do anything to cause the quality of submission and eventual result to ever decrease.



In a perfect world, this would go without saying, give or take the “most robust” chest-beating. But I am probably not the only person to have noticed that the world is not always perfect. Sadly, it is probably necessary to remind some people that “job one” for the Linux kernel community is the health and well-being of the Linux kernel itself, and not their own pet project, whatever that might be.



On the other hand, I was also heartened by what does not appear in the above paragraph. There is no assertion that the Linux kernel community's processes are perfect, which is all to the good, because delusions of perfection all too often prevent progress in mature projects. In fact, in this imperfect world, there is nothing so good that it cannot be made better. On the other hand, there also is nothing so bad that it cannot be made worse, so random wholesale changes should be tested somewhere before being applied globally to a project as important as the Linux kernel. I was therefore quite happy to read the last part of this paragraph: “we do not want to do anything to cause the quality of submission and eventual result to ever decrease.”



If however, anyone feels personally abused, threatened, or otherwise uncomfortable due to this process, that is not acceptable.



That sentence is of course critically important, but must be interpreted carefully. For example, it is all too possible that someone might feel abused, threatened, and uncomfortable by the mere fact of a patch being rejected, even if that rejection was both civil and absolutely necessary for the continued robust operation of the Linux kernel. Or someone might claim to feel that way, if they felt that doing so would get their patch accepted. (If this sounds impossible to you, be thankful, but also please understand that the range of human behavior is extremely wide.) In addition, I certainly feel uncomfortable when someone points out a stupid mistake in one of my patches, but that discomfort is my problem, and furthermore encourages me to improve, which is a good thing. For but one example, this discomfort is exactly what motivated me to write the rcutorture test suite. Therefore, although I hope that we all know what is intended by the words “abused”, “threatened”, and “uncomfortable” in that sentence, the fact is that it will never be possible to fully codify the difference between constructive and destructive behavior.



Therefore, the resolution process is quite important:



If so, please contact the Linux Foundation's Technical Advisory Board at <tab@lists.linux-foundation.org>, or the individual members, and they will work to resolve the issue to the best of their ability. For more information on who is on the Technical Advisory Board and what their role is, please see:



http://www.linuxfoundation.org/programs/advisory-councils/tab



There can be no perfect resolution process, but this one seems to be squarely in the “good enough” category. The timeframes are long enough that people will not be rewarded by complaining to the LF TAB instead of fixing their patches. The composition of the LF TAB, although not perfect, is diverse, consisting of both men and women from multiple countries. The LF TAB appears to be able to manage the inevitable differences of opinion, based on the fact that not all members provided their Acked-by for this Code of Conflict. And finally, the LF TAB is an elected body that has oversight via the LF, so there are feedback mechanisms. Again, this is not perfect, but it is good enough that I am willing to overlook my concerns about the first sentence in the paragraph.



On to the final paragraph:



As a reviewer of code, please strive to keep things civil and focused on the technical issues involved. We are all humans, and frustrations can be high on both sides of the process. Try to keep in mind the immortal words of Bill and Ted, “Be excellent to each other.”



And once again, in a perfect world it would not be necessary to say this. Sadly, we are human beings rather than angels, and so it does appear to be necessary. Then again, if we were all angels, this would be a very boring world.



Or at least that is what I keep telling myself!

March 11, 2015

The First CoreOS Fest

CoreOS Fest 2015

Get ready, CoreOS Fest, our celebration of everything distributed, is right around the corner! Our first CoreOS Fest is happening May 4 and 5, 2015 in San Francisco. You’ll learn more about application containers, container orchestration, clustering, devops security, new Linux, Go and more.

Join us for this two-day event as we talk about the newest in distributed systems technologies and together talk about securing the Internet. Be part of discussions shaping modern infrastructure stacks, hear from peers on how they are using these technologies today and get inspired to learn new ways to speed up your application development process.

Take a journey with us (in space and time) and help contribute to the next generation of infrastructure. The early bird tickets are available until March 31st and are only $199, so snatch one up now before they are gone. After March 31st, tickets will be available for $349. See you in May.

Submit an Abstract

Grab An Early Bird Ticket

If you are interested in sponsoring the event, reach out to fest@coreos.com and we would be happy to send you the prospectus.

March 10, 2015

Py3progress updated

Another year down!

I've updated the py3progress site with the whole of 2014, and what we have so far in 2015. I'll post a review of the last year like I have before later.

March 09, 2015

Verification Challenge 4: Tiny RCU

The first and second verification challenges were directed to people working on verification tools, and the third challenge was directed at developers. Perhaps you are thinking that it is high time that I stop picking on others and instead direct a challenge at myself. If so, this is the challenge you were looking for!



The challenge is to take the v3.19 Linux kernel code implementing Tiny RCU, unmodified, and use some formal-verification tool to prove that its grace periods are correctly implemented.



This requires a tool that can handle multiple threads. Yes, Tiny RCU runs only on a single CPU, but the proof will require at least two threads. The basic idea is to have one thread update a variable, wait for a grace period, then update a second variable, while another thread accesses both variables within an RCU read-side critical section, and a third parent thread verifies that this critical section did not span a grace period, like this:



 1 int x;
 2 int y;
 3 int r1;
 4 int r2;
 5
 6 void rcu_reader(void)
 7 {
 8   rcu_read_lock();
 9   r1 = x; 
10   r2 = y; 
11   rcu_read_unlock();
12 }
13
14 void *thread_update(void *arg)
15 {
16   x = 1; 
17   synchronize_rcu();
18   y = 1; 
19 }
20
21 . . .
22
23 assert(r2 == 0 || r1 == 1);




Of course, rcu_reader()'s RCU read-side critical section is not allowed to span thread_update()'s grace period, which is provided by synchronize_rcu(). Therefore, rcu_reader() must execute entirely before the end of the grace period (in which case r2 must be zero, keeping in mind C's default initialization to zero), or it must execute entirely after the beginning of the grace period (in which case r1 must be one).



There are a few technical problems to solve:





  1. The Tiny RCU code #includes numerous “interesting” files. I supplied empty files as needed and used “-I .” to focus the C preprocessor's attention on the current directory.

  2. Tiny RCU uses a number of equally interesting Linux-kernel primitives. I stubbed most of these out in fake.h, but copied a number of definitions from the Linux kernel, including IS_ENABLED, barrier(), and bool.

  3. Tiny RCU runs on a single CPU, so the two threads shown above must act as if this was the case. I used pthread_mutex_lock() to provide the needed mutual exclusion, keeping in mind that Tiny RCU is available only with CONFIG_PREEMPT=n. The thread that holds the lock is running on the sole CPU.

  4. The synchronize_rcu() function can block. I modeled this by having it drop the lock and then re-acquire it.

  5. The dyntick-idle subsystem assumes that the boot CPU is born non-idle, but in this case the system starts out idle. After a surprisingly long period of confusion, I handled this by having main() invoke rcu_idle_enter() before spawning the two threads. The confusion eventually proved beneficial, but more on that later.





The first step is to get the code to build and run normally. You can omit this step if you want, but given that compilers usually generate better diagnostics than do the formal-verification tools, it is best to make full use of the compilers.



I first tried goto-cc, goto-instrument, and satabs [Slide 44 of PDF] and impara [Slide 52 of PDF], but both tools objected strenuously to my code. My copies of these two tools are a bit dated, so it is possible that these problems have since been fixed. However, I decided to download version 5 of cbmc, which is said to have gained multithreading support.



After converting my code to a logic expression with no fewer than 109,811 variables and 457,344 clauses, cbmc -I . -DRUN fake.c took a bit more than ten seconds to announce VERIFICATION SUCCESSFUL. But should I trust it? After all, I might have a bug in my scaffolding or there might be a bug in cbmc.



The usual way to check for this is to inject a bug and see if cbmc catches it. I chose to break up the RCU read-side critical section as follows:



 1 void rcu_reader(void)
 2 {
 3   rcu_read_lock();
 4   r1 = x; 
 5   rcu_read_unlock();
 6   cond_resched();
 7   rcu_read_lock();
 8   r2 = y; 
 9   rcu_read_unlock();
10 }




Why not remove thread_update()'s call to synchronize_rcu()? Take a look at Tiny RCU's implementation of synchronize_rcu() to see why not!



With this change enabled via #ifdef statements, “cbmc -I . -DRUN -DFORCE_FAILURE fake.c” took almost 20 seconds to find a counter-example in a logic expression with 185,627 variables and 815,691 clauses. Needless to say, I am glad that I didn't have to manipulate this logic expression by hand!



Because cbmc catches an injected bug and verifies the original code, we have some reason to hope that the VERIFICATION SUCCESSFUL was in fact legitimate. As far as I know, this is the first mechanical proof of the grace-period property of a Linux-kernel RCU implementation, though admittedly of a rather trivial implementation. On the other hand, a mechanical proof of some properties of the dyntick-idle counters came along for the ride, courtesy of the WARN_ON_ONCE() statements in the Linux-kernel source code. (Previously, researchers at Oxford mechanically validated the relationship between rcu_dereference() and rcu_assign_pointer(), taking the whole of Tree RCU as input, and researchers at MPI-SWS formally validated userspace RCU's grace-period guarantee—manually.)



As noted earlier, I had confused myself into thinking that cbmc did not handle pthread_mutex_lock(). I verified that cbmc handles the gcc atomic builtins, but it turns out to be impractical to build a lock for cbmc's use from atomics. The problem stems from the “b” for “bounded” in “cbmc”, which means cbmc cannot analyze the unbounded spin loops used in locking primitives.



However, cbmc does do the equivalent of a full state-space search, which means it will automatically model all possible combinations of lock-acquisition delays even in the absence of a spin loop. This suggests something like the following:



 1 if (__sync_fetch_and_add(&cpu_lock, 1))
 2   exit();




The idea is to exclude from consideration any executions where the lock cannot be immediately acquired, again relying on the fact that cbmc automatically models all possible combinations of delays that the spin loop might have otherwise produced, but without the need for an actual spin loop. This actually works, but my mis-modeling of dynticks fooled me into thinking that it did not. I therefore made lock-acquisition failure set a global variable and added this global variable to all assertions. When this failed, I had sufficient motivation to think, which caused me to find my dynticks mistake. Fixing this mistake fixed all three versions (locking, exit(), and flag).



The exit() and flag approaches result in exactly the same number of variables and clauses, which turns out to be quite a bit fewer than the locking approach:



exit()/flaglocking
Verification69,050 variables, 287,548 clauses (output)109,811 variables, 457,344 clauses (output)
Verification Forced Failure113,947 variables, 501,366 clauses (output)   185,627 variables, 815,691 clauses (output)




So locking increases the size of the logic expressions by quite a bit, but interestingly enough does not have much effect on verification time. Nevertheless, these three approaches show a few of the tricks that can be used to accomplish real work using formal verification.



The GPL-licensed source for the Tiny RCU validation may be found here. C-preprocessor macros select the various options, with -DRUN being necessary for both real runs and cbmc verification (as opposed to goto-cc or impara verification), -DCBMC forcing the atomic-and-flag substitute for locking, and -DFORCE_FAILURE forcing the failure case. For example, to run the failure case using the atomic-and-flag approach, use:



cbmc -I . -DRUN -DCBMC -DFORCE_FAILURE fake.c




Possible next steps include verifying dynticks and interrupts, dynticks and NMIs, and of course use of call_rcu() in place of synchronize_rcu(). If you try these out, please let me know how it goes!

CoreOS on VMware vSphere and VMware vCloud Air

At CoreOS, we want to make the world successful with containers on all computing platforms. Today, we are taking one step closer to that goal by announcing, with VMware, that CoreOS is fully supported and integrated with both VMware vSphere 5.5 and VMware vCloud Air. Enterprises that have been evaluating using containers but needed fully supported environments to begin now have the support to get started.

We’ve worked closely with VMware in enabling CoreOS to run on vSphere 5.5 (see the technical preview of CoreOS on vSphere 5.5). This collaboration extends the security, consistency, and reliability advantages of CoreOS to users of vSphere. Developers can focus on their applications and operations get the control they need. We encourage you to read more from VMware here:

CoreOS Now Supported on VMware vSphere 5.5 and VMware vCloud Air.

As a sysadmin you’ve gotta be thinking, what does this mean for me?

Many people have been running CoreOS on VMware for a while now, but something was missing. Mainly performance and full integration with VMware management APIs. Today that all changes. CoreOS is now shipping open-vm-tools, the open source implementation of VMware Tools, which enables better performance and enables management of CoreOS VMs running in all VMware environments.

Lets take a quick moment to explore some of the things that are now possible.

Taking CoreOS for a spin with VMware Fusion

The following tutorial will walk you through downloading an official CoreOS VMware image and configuring it using a cloud config drive. Once configured, a CoreOS instance will be launched and managed using the vmrun command line tool that ships with VMware Fusion.

To make the following commands easier to run set the following vmrun alias in your shell:

alias vmrun='/Applications/VMware\ Fusion.app/Contents/Library/vmrun'

Download a CoreOS VMware Image

First things first, download a CoreOS VMware image and save it to your local machine:

$ mkdir coreos-vmware
$ cd coreos-vmware
$ wget http://alpha.release.core-os.net/amd64-usr/current/coreos_production_vmware.vmx
$ wget http://alpha.release.core-os.net/amd64-usr/current/coreos_production_vmware_image.vmdk.bz2

Decompress the VMware disk image:

$ bzip2 -d coreos_production_vmware_image.vmdk.bz2

Configuring a CoreOS VM with a config-drive

By default CoreOS VMware images do not have any users configured, which means you won’t be able to login to your VM after it boots. Also, many of the vmrun guest OS commands require a valid CoreOS username and password.

A config-drive is the best way to configure a CoreOS instance running on VMware. Before you can create a config-drive, you’ll need some user data. For this tutorial you will use a CoreOS cloud-config file as user data to configure users and set the hostname.

Generate the password hash for the core and root users

Before creating the cloud-config file, generate a password hash for the core and root users:

$ openssl passwd -1
Password:
Verifying - Password:
$1$LEfVXsiG$lhcyOrkJq02jWnEhF93IR/

Enter vmware at both password prompts.

Create a cloud config file

Now we are ready to create a cloud-config file:

edit cloud-config.yaml

#cloud-config

hostname: vmware-guest
users:
  - name: core
    passwd: $1$LEfVXsiG$lhcyOrkJq02jWnEhF93IR/
    groups:
      - sudo
      - docker
  - name: root
    passwd: $1$LEfVXsiG$lhcyOrkJq02jWnEhF93IR/

Create a config-drive

With your cloud-config file in place you can use it to create a config drive. The easiest way to create a config-drive is to generate an ISO using a cloud-config file and attach it to a VM.

$ mkdir -p /tmp/new-drive/openstack/latest
$ cp cloud-config.yaml /tmp/new-drive/openstack/latest/user_data
$ hdiutil makehybrid -iso -joliet -joliet-volume-name "config-2" -o ~/cloudconfig.iso /tmp/new-drive
$ rm -r /tmp/new-drive

At this point you should have a config-drive named cloudconfig.iso in your home directory.

Attaching a config-drive to a VM

Before booting the CoreOS VM the config-drive must be attached to the VM. Do this by appending the following lines to the coreos_production_vmware.vmx config file:

ide0:0.present = "TRUE"
ide0:0.autodetect = "TRUE"
ide0:0.deviceType = "cdrom-image"
ide0:0.fileName = "/Users/kelseyhightower/cloudconfig.iso"

At this point you are ready to launch the CoreOS VM:

vmrun start coreos_production_vmware.vmx

CoreOS on VMware

Running commands

With the CoreOS VM up and running use the vmrun command line tool to interact with it. Let's start by checking the status of vmware-tools in the VM:

$ vmrun checkToolsState coreos_production_vmware.vmx

Grab the VM’s IP address with the getGuestIPAddress command:

$ vmrun getGuestIPAddress coreos_production_vmware.vmx

Full VMware integration also means you can now run guest OS commands. For example you can list the running processes using the listProcessesInGuest command:

$ vmrun -gu core -gp vmware listProcessesInGuest coreos_production_vmware.vmx
Process list: 63
pid=1, owner=root, cmd=/usr/lib/systemd/systemd --switched-root --system --deserialize 21
pid=2, owner=root, cmd=kthreadd
pid=3, owner=root, cmd=ksoftirqd/0
pid=4, owner=root, cmd=kworker/0:0
pid=5, owner=root, cmd=kworker/0:0H
pid=6, owner=root, cmd=kworker/u2:0
...

Finally you can now run arbitrary commands and scripts using VMware management tools. For example, use the runProgramInGuest command to initiate a graceful shutdown:

$ vmrun -gu root -gp vmware runProgramInGuest coreos_production_vmware.vmx /usr/sbin/shutdown now

CoreOS on VMware

We have only scratched the surface regarding the number of things you can do with the new VMware powered CoreOS images. Check out the “Using vmrun to Control Virtual Machines” e-book for more details.

CoreOS and VMware going forward

We look forward to continuing on the journey to secure the backend of the Internet by working on all types of platforms in the cloud or behind the firewall. We are continuing to work with VMware so that CoreOS is also supported on the recently announced vSphere 6. If you have any questions in the meantime, you can find us on IRC as you get started. Feedback can also be provided at the VMware / CoreOS community forum.

March Update

It’s been a busy start to the year with lots going on in the Sahana. There’s been some great voluntary contributions over the past months. Tom Baker has been making some great progress extending continuing his work developing a Sahana [Read the Rest...]

March 08, 2015

Technocracy: a short look at the impact of technology on modern political and power structures

Below is an essay I wrote for some study that I thought might be fun to share. If you like this, please see the other blog posts tagged as Gov 2.0. Please note, this is a personal essay and not representative of anyone else :)

In recent centuries we have seen a dramatic change in the world brought about by the rise of and proliferation of modern democracies. This shift in governance structures gives the common individual a specific role in the power structure, and differs sharply from more traditional top down power structures. This change has instilled in many of the world’s population some common assumptions about the roles, responsibilities and rights of citizens and their governing bodies. Though there will always exist a natural tension between those in power and those governed, modern governments are generally expected to be a benevolent and accountable mechanism that balances this tension for the good of the society as a whole.

In recent decades the Internet has rapidly further evolved the expectations and individual capacity of people around the globe through, for the first time in history, the mass distribution of the traditional bastions of power. With a third of the world online and countries starting to enshrine access to the Internet as a human right, individuals have more power than ever before to influence and shape their lives and the lives of people around them. It is easier that ever for people to congregate, albeit virtually, according to common interests and goals, regardless of their location, beliefs, language, culture or other age old barriers to collaboration. This is having a direct and dramatic impact on governments and traditional power structures everywhere, and is both extending and challenging the principles and foundations of democracy.

This short paper outlines how the Internet has empowered individuals in an unprecedented and prolific way, and how this has changed and continues to change the balance of power in societies around the world, including how governments and democracies work.

Democracy and equality

The concept of an individual having any implicit rights or equality isn’t new, let alone the idea that an individual in a society should have some say over the ruling of the society. Indeed the idea of democracy itself has been around since the ancient Greeks in 500 BCE. The basis for modern democracies lies with the Parliament of England in the 11th century at a time when the laws of the Crown largely relied upon the support of the clergy and nobility, and the Great Council was formed for consultation and to gain consent from power brokers. In subsequent centuries, great concerns about leadership and taxes effectively led to a strongly increased role in administrative power and oversight by the parliament rather than the Crown.

The practical basis for modern government structures with elected official had emerged by the 17th century. This idea was already established in England, but also took root in the United States. This was closely followed by multiple suffrage movements from the 19th and 20th centuries which expanded the right to participate in modern democracies from (typically) adult white property owners to almost all adults in those societies.

It is quite astounding to consider the dramatic change from very hierarchical, largely unaccountable and highly centralised power systems to democratic ones in which those in powers are expected to be held to account. This shift from top down power, to distributed, representative and accountable power is an important step to understand modern expectations.

Democracy itself is sustainable only when the key principle of equality is deeply ingrained in the population at large. This principle has been largely infused into Western culture and democracies, independent of religion, including in largely secular and multicultural democracies such as Australia. This is important because an assumption of equality underpins stability in a system that puts into the hands of its citizens the ability to make a decision. If one component of the society feels another doesn’t have an equal right to a vote, then outcomes other than their own are not accepted as legitimate. This has been an ongoing challenge in some parts of the world more than others.

In many ways there is a huge gap between the fearful sentiments of Thomas Hobbes, who preferred a complete and powerful authority to keep the supposed ‘brutish nature’ of mankind at bay, and the aspirations of John Locke who felt that even governments should be held to account and the role of the government was to secure the natural rights of the individual to life, liberty and property. Yet both of these men and indeed, many political theorists over many years, have started from a premise that all men are equal – either equally capable of taking from and harming others, or equal with regards to their individual rights.

Arguably, the Western notion of individual rights is rooted in religion. The Christian idea that all men are created equal under a deity presents an interesting contrast to traditional power structures that assume one person, family or group have more rights than the rest, although ironically various churches have not treated all people equally either. Christianity has deeply influenced many political thinkers and the forming of modern democracies, many of which which look very similar to the mixed regime system described by Saint Thomas Aquinas in his Summa Thelogiae essays:

Some, indeed, say that the best constitution is a combination of all existing forms, and they praise the Lacedemonian because it is made up of oligarchy, monarchy, and democracy, the king forming the monarchy, and the council of elders the oligarchy, while the democratic element is represented by the Ephors: for the Ephors are selected from the people.

The assumption of equality has been enshrined in key influential documents including the United States Declaration of Independence, 1776:

We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.

More recently in the 20th Century, the Universal Declaration of Human Rights goes even further to define and enshrine equality and rights, marking them as important for the entire society:

Whereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world…1st sentence of the Preamble to the Universal Declaration of Human Rights

All human beings are born free and equal in dignity and rights.Article 1 of the United Nations Universal Declaration of Human Rights (UDHR)

The evolution of the concepts of equality and “rights” is important to understand as they provide the basis for how the Internet is having such a disruptive impact on traditional power structures, whilst also being a natural extension of an evolution in human thinking that has been hundreds of years in the making.

Great expectations

Although only a third of the world is online, in many countries this means the vast bulk of the population. In Australia over 88% of households are online as of 2012. Constant online access starts to drive a series of new expectations and behaviours in a community, especially one where equality has already been so deeply ingrained as a basic principle.

Over time a series of Internet-based instincts and perspectives have become mainstream, arguably driven by the very nature of the technology and the tools that we use online. For example, the Internet was developed to “route around damage” which means the technology can withstand technical interruption by another hardware or software means. Where damage is interpreted in a social sense, such as perhaps censorship or locking away access to knowledge, individuals instinctively seek and develop a work around and you see something quite profound. A society has emerged that doesn’t blindly accept limitations put upon them. This is quite a challenge for traditional power structures.

The Internet has become both an extension and an enabler of equality and power by massively distributing both to ordinary people around the world. How has power and equality been distributed? When you consider what constitutes power, four elements come to mind: publishing, communications, monitoring and enforcement.

Publishing – in times gone past the ideas that spread beyond a small geographical area either traveled word of mouth via trade routes, or made it into a book. Only the wealthy could afford to print and distribute the written word, so publishing and dissemination of information was a power limited to a small number of people. Today the spreading of ideas is extremely easy, cheap and can be done anonymously. Anyone can start a blog, use social media, and the proliferation of information creation and dissemination is unprecedented. How does this change society? Firstly there is an assumption that an individual can tell their story to a global audience, which means an official story is easily challenged not only by the intended audience, but by the people about whom the story is written. Individuals online expect both to have their say, and to find multiple perspectives that they can weigh up, and determine for themselves what is most credible. This presents significant challenges to traditional powers such as governments in establishing an authoritative voice unless they can establish trust with the citizens they serve.

Communications– individuals have always had some method to communicate with individuals in other communities and countries, but up until recent decades these methods have been quite expensive, slow and oftentimes controlled. This has meant that historically, people have tended to form social and professional relationships with those close by, largely out of convenience. The Internet has made it easy to communicate, collaborate with, and coordinate with individuals and groups all around the world, in real time. This has made massive and global civil responses and movements possible, which has challenged traditional and geographically defined powers substantially. It has also presented a significant challenge for governments to predict and control information flow and relationships within the society. It also created a challenge for how to support the best interests of citizens, given the tension between what is good for a geographically defined nation state doesn’t always align with what is good for an online and trans-nationally focused citizen.

Monitoring – traditional power structures have always had ways to monitor the masses. Monitoring helps maintain rule of law through assisting in the enforcement of laws, and is often upheld through self-reporting as those affected by broken laws will report issues to hold detractors to account. In just the last 50 years, modern technologies like CCTV have made monitoring of the people a trivial task, where video cameras can record what is happening 24 hours a day. Foucault spoke of the panopticon gaol design as a metaphor for a modern surveillance state, where everyone is constantly watched on camera. The panopticon was a gaol design wherein detainees could not tell if they were being observed by gaolers or not, enabling in principle, less gaolers to control a large number of prisoners. In the same way prisoners would theoretically behave better under observation, Foucault was concerned that omnipresent surveillance would lead to all individuals being more conservative and limited in themselves if they knew they could be watched at any time. The Internet has turned this model on its head. Although governments can more easily monitor citizens than ever before, individuals can also monitor each other and indeed, monitor governments for misbehaviour. This has led to individuals, governments, companies and other entities all being held to account publicly, sometimes violently or unfairly so.

Enforcement – enforcement of laws are a key role of a power structure, to ensure the rules of a society are maintained for the benefit of stability and prosperity. Enforcement can take many forms including physical (gaol, punishment) or psychological (pressure, public humiliation). Power structures have many ways of enforcing the rules of a society on individuals, but the Internet gives individuals substantial enforcement tools of their own. Power used to be who had the biggest sword, or gun, or police force. Now that major powers and indeed, economies, rely so heavily upon the Internet, there is a power in the ability to disrupt communications. In taking down a government or corporate website or online service, an individual or small group of individuals can have an impact far greater than in the past on power structures in their society, and can do so anonymously. This becomes quite profound as citizen groups can emerge with their own philosophical premise and the tools to monitor and enforce their perspective.

Property – property has always been a strong basis of law and order and still plays an important part in democracy, though perspectives towards property are arguably starting to shift. Copyright was invented to protect the “intellectual property” of a person against copying at a time when copying was quite a physical business, and when the mode of distributing information was very expensive. Now, digital information is so easy to copy that it has created a change in expectations and a struggle for traditional models of intellectual property. New models of copyright have emerged that explicitly support copying (copyleft) and some have been successful, such as with the Open Source software industry or with remix music culture. 3D printing will change the game again as we will see in the near future the massive distribution of the ability to copy physical goods, not just virtual ones. This is already creating havoc with those who seek to protect traditional approaches to property but it also presents an extraordinary opportunity for mankind to have greater distribution of physical wealth, not just virtual wealth. Particularly if you consider the current use of 3D printing to create transplant organs, or the potential of 3D printing combined with some form of nano technology that could reassemble matter into food or other essential living items. That is starting to step into science fiction, but we should consider the broader potential of these new technologies before we decide to arbitrarily limit them based on traditional views of copyright, as we are already starting to see.

By massively distributing publishing, communications, monitoring and enforcement, and with the coming potential massive distribution of property, technology and the Internet has created an ad hoc, self-determined and grassroots power base that challenges traditional power structures and governments.

With great power…

Individuals online find themselves more empowered and self-determined than ever before, regardless of the socio-political nature of their circumstances. They can share and seek information directly from other individuals, bypassing traditional gatekeepers of knowledge. They can coordinate with like-minded citizens both nationally and internationally and establish communities of interest that transcend geo-politics. They can monitor elected officials, bureaucrats, companies and other individuals, and even hold them all to account.

To leverage these opportunities fully requires a reasonable amount of technical literacy. As such, many technologists are on the front line, playing a special role in supporting, challenging and sometimes overthrowing modern power structures. As technical literacy is permeating mainstream culture more individuals are able to leverage these disrupters, but technologist activists are often the most effective at disrupting power through the use of technology and the Internet.

Of course, whilst the Internet is a threat to traditional centralised power structures, it also presents an unprecedented opportunity to leverage the skills, knowledge and efforts of an entire society in the running of government, for the benefit of all. Citizen engagement in democracy and government beyond the ballot box presents the ability to co-develop, or co-design the future of the society, including the services and rules that support stability and prosperity. Arguably, citizen buy-in and support is now an important part of the stability of a society and success of a policy.

Disrupting the status quo

The combination of improved capacity for self-determination by individuals along with the increasingly pervasive assumptions of equality and rights have led to many examples of traditional power structures being held to account, challenged, and in some cases, overthrown.

Governments are able to be held more strongly to account than ever before. The Open Australia Foundation is a small group of technologists in Australia who create tools to improve transparency and citizen engagement in the Australian democracy. They created Open Australia, a site that made the public parliamentary record more accessible to individuals through making it searchable, subscribable and easy to browse and comment on. They also have projects such as Planning Alerts which notifies citizens of planned development in their area, Election Leaflets where citizens upload political pamphlets for public record and accountability, and Right to Know, a site to assist the general public in pursuing information and public records from the government under Freedom of Information. These are all projects that monitor, engage and inform citizens about government.

Wikileaks is a website and organisation that provides an anonymous way for individuals to anonymously leak sensitive information, often classified government information. Key examples include video and documents from the Iraq and Afghanistan wars, about the Guantanamo Bay detention camp, United States diplomatic cables and million of emails from Syrian political and corporate figures. Some of the information revealed by Wikileaks has had quite dramatic consequences with the media and citizens around the world responding to the information. Arguably, many of the Arab Spring uprisings throughout the Middle East from December 2010 were provoked by the release of the US diplomatic cables by Wikileaks, as it demonstrated very clearly the level of corruption in many countries. The Internet also played a vital part in many of these uprisings, some of which saw governments deposed, as social media tools such as Twitter and Facebook provided the mechanism for massive coordination of protests, but importantly also provided a way to get citizen coverage of the protests and police/army brutality, creating global audience, commentary and pressure on the governments and support for the protesters.

Citizen journalism is an interesting challenge to governments because the route to communicate with the general public has traditionally been through the media. The media has presented for many years a reasonably predictable mechanism for governments to communicate an official statement and shape public narrative. But the Internet has facilitated any individual to publish online to a global audience, and this has resulted in a much more robust exchange of ideas and less clear cut public narrative about any particular issue, sometimes directly challenging official statements. A particularly interesting case of this was the Salam Pax blog during the 2003 Iraq invasion by the United States. Official news from the US would largely talk about the success of the campaign to overthrown Suddam Hussein. The Salam Pax blog provided the view of a 29 year old educated Iraqi architect living in Baghdad and experiencing the invasion as a citizen, which contrasted quite significantly at times with official US Government reports. This type of contrast will continue to be a challenge to governments.

On the flip side, the Internet has also provided new ways for governments themselves to support and engage citizens. There has been the growth of a global open government movement, where governments themselves try to improve transparency, public engagement and services delivery using the Internet. Open data is a good example of this, with governments going above and beyond traditional freedom of information obligations to proactively release raw data online for public scrutiny. Digital services allow citizens to interact with their government online rather than the inconvenience of having to physically attend a shopfront. Many governments around the world are making public commitments to improving the transparency, engagement and services for their citizens. We now also see more politicians and bureaucrats engaging directly with citizens online through the use of social media, blogs and sophisticated public consultations tools. Governments have become, in short, more engaged, more responsive and more accountable to more people than ever before.

Conclusion

Only in recent centuries have power structures emerged with a specific role for common individual citizens. The relationship between individuals and power structures has long been about the balance between what the power could enforce and what the population would accept. With the emergence of power structures that support and enshrine the principles of equality and human rights, individuals around the world have come to expect the capacity to determine their own future. The growth of and proliferation of democracy has been a key shift in how individuals relate to power and governance structures.

New technologies and the Internet has gone on to massively distribute the traditionally centralised powers of publishing, communications, monitoring and enforcement (with property on the way). This distribution of power through the means of technology has seen democracy evolve into something of a technocracy, a system which has effectively tipped the balance of power from institutions to individuals.

References

Hobbes, T. The Leviathan, ed. by R. Tuck, Cambridge University Press, 1991.

Aquinas, T. Sum. Theol. i-ii. 105. 1, trans. A. C. Pegis, Whether the old law enjoined fitting precepts concerning rulers?

Uzgalis, William, “John Locke”, The Stanford Encyclopedia of Philosophy (Fall 2012 Edition), Edward N. Zalta (ed.), http://plato.stanford.edu/archives/fall2012/entries/locke/.

See additional useful references linked throughout essay.

March 05, 2015

Managing CoreOS Logs with Logentries

Today Logentries announced a CoreOS integration, so CoreOS users can get a a deeper understanding into their CoreOS environments. The new integration enables CoreOS users to easily send logs using the Journal logging system, part of CoreOS’ Systemd process manager, directly into Logentries for real-time monitoring, alerting, and data visualization. This is the first CoreOS log management integration.

To learn more about centralizing logs from CoreOS clusters read Trevor Parsons, co-founder and chief scientist at Logentires post. Or get started by following the documentation here.

March 03, 2015

Upcoming CoreOS Events in March

March brings a variety of events – including a keynote from Alex Polvi (@polvi), CEO of CoreOS, at Rackspace Solve. Read on for more details on the team’s whereabouts this month.

In case you missed it, Alex keynoted at The Linux Foundation Collab Summit last month. See the replay.


Tuesday, March 3, 2015 at 6 p.m. EST – Montreal, QC

The month kicks off with Jake Moshenko (@JacobMoshenko), product manager for the Quay.io container registry at CoreOS, at Big Data Montreal. Jake will discuss Rocket and how CoreOS and Quay.io fit into the development lifecycle.


Wednesday, March 4, 2015 at 1:30 p.m. PST – San Francisco, CA

Join us at Rackspace Solve to see Alex Polvi (@polvi), CEO of CoreOS, speak about Container Technology: Applications and Implications. Registration is free and there will be talks from Wikimedia, Walmart Labs, DigitalFilmTree, Tinder and more.


Tuesday, March 10, 2015 at 7 p.m. GWT - London, England

If you find yourself in London, be sure to stop by the CoreOS London meetup. They are currently confirming speakers for the event. If you are interested in speaking be sure to submit on github.


Monday, March 16, 2015 at 7 p.m. PDT – San Francisco, CA

Join the CoreOS San Francisco March Meetup at Imgur (@imgur). On the agenda: Rocket, appc spec and etcd. Chris Winslett from Compose.io (@composeio) will also explain how Compose.io uses etcd as its “repository of truth.”


Friday, March 27, 2015 at 6:45 p.m. CDT – Pflugerville, TX

Brian “Redbeard” Harrington (@brianredbeard), is an opening speaker at Container Days Austin. Container Days provides a forum for all interested in the technical, process and production ramifications of adopting container style virtualization. Get your tickets here.


AirPair Writing Completition

On another note, this month CoreOS has joined the AirPair $100K writing competition. For more details please see the contest site, https://www.airpair.com/100k-writing-competition, for more information.

If you have a CoreOS, etcd or Rocket implementation, tutorial or use case now is a great time to share. How do you apply CoreOS for automatic server updates? Do you have any stories about your implementation of etcd to keep an application up when a server needs or goes down? Any exciting ways you have applied Rocket, the first container runtime based on the Application Container specification?

If you are interested in writing about your experiences with CoreOS, etcd or Rocket, email press@coreos.com and we will give you support to make your post a success. More details about the competition are here.

March 02, 2015

Verification Challenge 3: cbmc

The first and second verification challenges were directed to people working on verification tools, but this one is instead directed at developers.



It turns out that there are a number of verification tools that have seen heavy use. For example, I have written several times about Promela and spin (here, here, and here), which I have used from time to time over the past 20 years. However, this tool requires that you translate your code to Promela, which is not conducive to use of Promela for regression tests.



For those of use working in the Linux kernel, it would be nice to have a verification tool that operated directly on C source code. And there are tools that do just that, for example, the C Bounded Model Checker (cbmc). This tool, which is included in a number of Linux distributions, converts a C-language input file into a (possibly quite large) logic expression. This expression is constructed so that if any combination of variables causes the logic expression to evaluate to true, then (and only then) one of the assertions can be triggered. This logic expression is then passed to a SAT solver, and if this SAT solver finds a solution, then there is a set of inputs that can trigger the assertion. The cbmc tool is also capable of checking for array-bounds errors and some classes of pointer misuse.



Current versions of cbmc can handle some useful tasks. For example, suppose it was necessary to reverse the sense of the if condition in the following code fragment from Linux-kernel RCU:



 1   if (rnp->exp_tasks != NULL ||
 2       (rnp->gp_tasks != NULL &&
 3        rnp->boost_tasks == NULL &&
 4        rnp->qsmask == 0 &&
 5        ULONG_CMP_GE(jiffies, rnp->boost_time))) {
 6     if (rnp->exp_tasks == NULL) 
 7       rnp->boost_tasks = rnp->gp_tasks;
 8     /* raw_spin_unlock_irqrestore(&rnp->lock, flags); */
 9     t = rnp->boost_kthread_task;
10     if (t)   
11       rcu_wake_cond(t, rnp->boost_kthread_status);
12   } else {
13     rcu_initiate_boost_trace(rnp);
14     /* raw_spin_unlock_irqrestore(&rnp->lock, flags); */
15   }




This is a simple application of De Morgan's law, but an error-prone one, particularly if carried out in a distracting environment. Of course, to test a validation tool, it is best to feed it buggy code to see if it detects those known bugs. And applying De Morgan's law in a distracting environment is an excellent way to create bugs, as you can see below:



 1   if (rnp->exp_tasks == NULL &&
 2       (rnp->gp_tasks == NULL ||
 3        rnp->boost_tasks != NULL ||
 4        rnp->qsmask != 0 &&
 5        ULONG_CMP_LT(jiffies, rnp->boost_time))) {
 6     rcu_initiate_boost_trace(rnp);
 7     /* raw_spin_unlock_irqrestore(&rnp->lock, flags); */
 8   } else {
 9     if (rnp->exp_tasks == NULL) 
10       rnp->boost_tasks = rnp->gp_tasks;
11     /* raw_spin_unlock_irqrestore(&rnp->lock, flags); */
12     t = rnp->boost_kthread_task;
13     if (t)   
14       rcu_wake_cond(t, rnp->boost_kthread_status);
15   }




Of course, a full exhaustive test is infeasible, but structured testing would result in a manageable number of test cases. However, we can use cbmc to do the equivalent of a full exhaustive test, despite the fact that the number of combinations is on the order of two raised to the power 1,000. The approach is to create task_struct and rcu_node structures that contain only those fields that are used by this code fragment, but that also contain flags that indicate which functions were called and what their arguments were. This allows us to wrapper both the old and the new versions of the code fragment in their respective functions, and call them in sequence on different instances of identically initialized task_struct and rcu_node structures. These two calls are followed by an assertion that checks that the return value and the corresponding fields of the structures are identical.



This approach results in checkiftrans-1.c (raw C code here). Lines 5-8 show the abbreviated task_struct structure and lines 13-22 show the abbreviated rcu_node struButcture. Lines 10, 11, 24, and 25 show the instances. Lines 27-31 record a call to rcu_wake_cond() and lines 33-36 record a call to rcu_initiate_boost_trace().



Lines 38-49 initialize a task_struct/rcu_node structure pair. The rather unconventional use of the argv[] array works because cbmc assumes that this array contains random numbers. The old if statement is wrappered by do_old_if() on lines 51-71, while the new if statement is wrappered by do_new_if() on lines 73-93. The assertion is in check() on lines 95-107, and finally the main program is on lines 109-118.



Running cbmc checkiftrans-1.c gives this output, which prominently features VERIFICATION FAILED at the end of the file. On lines 4, 5, 12 and 13 of the file are complaints that neither ULONG_CMP_GE() nor ULONG_CMP_LT() are defined. Lacking definitions for these these two functions, cbmc seems to treat them as random-number generators, which could of course cause the two versions of the if statement to yield different results. This is easily fixed by adding the required definitions:



 1 #define ULONG_MAX         (~0UL)
 2 #define ULONG_CMP_GE(a, b)  (ULONG_MAX / 2 >= (a) - (b))
 3 #define ULONG_CMP_LT(a, b)  (ULONG_MAX / 2 < (a) - (b))




This results in checkiftrans-2.c (raw C code here). However, running cbmc checkiftrans-2.c gives this output, which still prominently features VERIFICATION FAILED at the end of the file. At least there are no longer any complaints about undefined functions!



It turns out that cbmc provides a counterexample in the form of a traceback. This traceback clearly shows that the two instances executed different code paths, and a closer examination of the two representations of the if statement show that I forgot to convert one of the && operators to a ||—that is, the “rnp->qsmask != 0 &&” on line 84 should instead be “rnp->qsmask != 0 ||”. Making this change results incheckiftrans-3.c (raw C code here). The inverted if statement is now as follows:



 1   if (rnp->exp_tasks == NULL &&
 2       (rnp->gp_tasks == NULL ||
 3        rnp->boost_tasks != NULL ||
 4        rnp->qsmask != 0 ||
 5        ULONG_CMP_LT(jiffies, rnp->boost_time))) {
 6     rcu_initiate_boost_trace(rnp);
 7     /* raw_spin_unlock_irqrestore(&rnp->lock, flags); */
 8   } else {
 9     if (rnp->exp_tasks == NULL) 
10       rnp->boost_tasks = rnp->gp_tasks;
11     /* raw_spin_unlock_irqrestore(&rnp->lock, flags); */
12     t = rnp->boost_kthread_task;
13     if (t)   
14       rcu_wake_cond(t, rnp->boost_kthread_status);
15   }




This time, running cbmc checkiftrans-3.c produces this output, which prominently features VERIFICATION SUCCESSFUL at the end of the file. Furthermore, this verification consumed only about 100 milliseconds on my aging laptop. And, even better, because it refused to verify the buggy version, we have at least some reason to believe it!



Of course, one can argue that doing such work carefully and in a quiet environment would eliminate the need for such verification, and 30 years ago I might have emphatically agreed with this argument. I have since learned that ideal work environments are not always as feasible as we might like to think, especially if there are small children (to say nothing of adult-sized children) in the vicinity. Besides which, human beings do make mistakes, even when working in ideal circumstances, and if we are to have reliable software, we need some way of catching these mistakes.



The canonical pattern for using cbmc in this way is as follows:



 1 retref = funcref(...);
 2 retnew = funcnew(...);
 3 assert(retref == retnew && ...);




The ... sequences represent any needed arguments to the calls and any needed comparisons of side effects within the assertion.



Of course, there are limitations:





  1. The “b” in cbmc stands for “bounded.” In particular, cbmc handles neither infinite loops nor infinite recursion. The --unwind and --depth arguments to cbmc allow you to control how much looping and recursion is analyzed. See the manual for more information.

  2. The SAT solvers used by cbmc have improved greatly over the past 25 years. In fact, where a 100-variable problem was at the edge of what could be handled in the 1990s, most ca-2015 solvers can handle more than a million variables. However, the NP-complete nature of SAT does occasionally make its presence known, for example, programs that reduce to a proof involving the pigeonhole principle are not handled well as of early 2015.

  3. Handling of concurrency is available in later versions of cbmc, but is not as mature as is the handling of single-threaded code.





All that aside, everything has its limitations, and cbmc's ease of use is quite impressive. I expect to continue to use it from time to time, and strongly recommend that you give it a try!

February 24, 2015

The Day Is My Enemy

Looking forward to see the live performance at the Future Music Festival 2015 :-)

February 23, 2015

Sahana This Week

It’s a busy week for Sahana around the world! Fran Boon, the Technical Lead for the Sahana software project, delivering a SahanaCamp training workshop for the Civil Society Disaster Platform, a coalition of disaster management organizations in Turkey. This workshop [Read the Rest...]

February 21, 2015

Confessions of a Recovering Proprietary Programmer, Part XIV

Although junk mail, puppies, and patches often are unwelcome, there are exceptions. For example, if someone has been wanting a particular breed of dog for some time, that person might be willing to accept a puppy, even if that means giving it shots, housebreaking it, teaching it the difference between furniture and food, doing bottlefeeding, watching over it day and night, and even putting up with some sleepless nights.



Similarly, if a patch fixes a difficult and elusive bug, the maintainer might be willing to apply the patch by hand, fix build errors and warnings, fix a few bugs in the patch itself, run a full set of tests, fix and style problems, and even accept the risk that the bug might have unexpected side effects, some of which might result in some sleepless nights. This in fact is one of the reasons for the common advice given to open-source newbies: start by fixing bugs.



Other good advice for new contributors can be found here:





  1. Greg Kroah-Hartman's HOWTO do Linux kernel development – take 2 (2005)

  2. Jonathan Corbet's How to Participate in the Linux Community (2008)

  3. Greg Kroah-Hartman's Write and Submit your first Linux kernel Patch (2010)

  4. My How to make a positive difference in a FOSS project (2012)

  5. Daniel Lezcano's What do we mean by working upstream: A long-term contributor’s view





This list is mostly about contributing to the Linux kernel, but most other projects have similar pages giving good new-contributor advice.

February 20, 2015

Kickstart new developers using Docker – Linux.conf.au 2015

One of the talks I gave at Linux.conf.au this year was a quick-start guide to using Docker.

The slides begin with building Apache from source on your local host, using their documentation, and then how much simpler it is if instead of documentation, the project provides a Dockerfile. I quickly gloss over making a slim production container from that large development container – see my other talk, which I’ll blog about a little later.

The second example, is using a Dockerfile to create and execute a test environment, so everyone can replicate identical test results.

Finally, I end with a quite example of fig (Docker Compose), and running GUI applications in containers.

the Slides

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

February 19, 2015

Using Sahana to Support Volunteer Technical Communities

There’s a lot of similarities  between traditional disaster management organizations and volunteer technical communities such as Sahana’s – especially when you look at our operations from a information management perspective. We collaborate on projects with partner organization, often breaking the [Read the Rest...]

February 13, 2015

App Container and Docker

A core principle of the App Container (appc) specification is that it is open: multiple implementations of the spec should exist and be developed independently. Even though the spec is young and pre-1.0, it has already seen a number of implementations.

With this in mind, over the last few weeks we have been working on ways to make appc interoperable with the Docker v1 Image format. As we discovered, the two formats are sufficiently compatible that Docker v1 Images can easily be run alongside appc images (ACIs). Today we want to describe two different demonstrations of this interoperability, and start a conversation about closer integration between the Docker and appc communities.

rkt Running Docker Images

rkt is an App Container implementation that fully implements the current state of the spec. This means it can download, verify and run App Container Images (ACIs). And now along with ACI support the latest release of rkt, v0.3.2, can download and run container images directly from the Docker Hub or any other Docker Registry:

$ rkt --insecure-skip-verify run docker://redis docker://tenstartups/redis-commander
rkt: fetching image from docker://redis
rkt: warning: signature verification has been disabled
Downloading layer: 511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158
…
      _.-``    `.  `_.  ''-._           Redis 2.8.19 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in stand alone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 3
...
[3] 12 Feb 09:09:19.071 # Server started, Redis version 2.8.19
# redis will be  running on 127.0.0.1:6379 and redis-commander on 127.0.0.1:8081

Docker Running App Container Images

At the same time as adding Docker support to rkt, we have also opened a pull-request that enables Docker to run appc images (ACIs). This is a simple functional PR that includes many of the essential features of the image spec. Docker API operations such as image list, run image by appc image ID and more work as expected and integrate with the native Docker experience. As a simple example, downloading and running an etcd ACI works seamlessly with the addition of this patchset:

$ docker pull --format aci coreos.com/etcd:v2.0.0
$ docker run --format aci coreos.com/etcd
2015/02/12 11:21:05 no data-dir provided, using default data-dir ./default.etcd
2015/02/12 11:21:05 etcd: listening for peers on http://localhost:2380
2015/02/12 11:21:05 etcd: listening for peers on http://localhost:7001

For more details, check out the PR itself.

Docker and App Container: Looking forward

We think App Container represents the next logical iteration in what a container image format, runtime engine, and discovery protocol should look like. App Container is young but we want to continue to get wider community feedback and see the spec evolve into something that can work for a number of runtimes.

Before appc spec reaches 1.0 (stable) status, we would like feedback from the Docker community on what might need to be modified in the spec in order for it to be supported natively in Docker. To gather feedback and start the discussion, we have put up a proposal to add appc support to Docker.

We are looking forward to getting additional feedback from the Docker community on this proposal. Working together, we can create a better appc spec for everyone to use, and over time, work towards a shared standard.

Join us on a mission to create a secure, composable, and standards-based container runtime. If you are interested in hacking on rkt or App Container we encourage you to get involved:

rkt

Help Wanted

Mailing list

App Container

Help Wanted

Mailing list

If you want more background on the appc spec, we encourage you to read our first blog post about the App Container spec and Rocket. Also read more in a recent Q&A with OpenSource.com.

February 11, 2015

Confessions of a Recovering Proprietary Programmer, Part XIII

True confession: I was once a serial junk mailer. Not mere email spam, but physical bulk-postage-rate flyers, advertising a series of non-technical conferences. It was of course for a good cause, and one of the most difficult things about that task was convincing that cause's leaders that this flyer was in fact junk mail. They firmly believed that anyone with even a scrap of compassion would of course read the flyer from beginning to end, feeling the full emotional impact of each and every lovingly crafted word. They reluctantly came around to my view, which was that we had at most 300 milliseconds to catch the recipient's attention, that being the amount of time that the recipient might glance at the flyer on its way into the trash. Or at least I think that they came around to my view. All I really know is that they stopped disputing the point.



But junk mail for worthy causes is not the only thing that can be less welcome than its sender might like.



For example, Jim Wasko noticed a sign at a daycare center that read: “If you are late picking up your child and have not called us in advance, we will give him/her an espresso and a puppy. Have a great day.”



Which goes to show that although puppies are cute and lovable, and although their mother no doubt went to a lot of trouble to bring them into this world, they are, just like junk mail, not universally welcome. And this should not be too surprising, given the questions that come to mind when contemplating a free puppy. Has it had its shots? Is it housebroken? Has it learned that furniture is not food? Has it been spayed/neutered? Is it able to eat normal dogfood, or does it still require bottlefeeding? Is it willing to entertain itself for long periods? And, last, but most definitely not least, is it willing to let you sleep through the night?



Nevertheless, people are often surprised and bitterly disappointed when their offers of free puppies are rejected.



Other people are just as surprised and disappointed when their offers of free patches are rejected. After all, they put a lot of work into their patches, and they might even get into trouble if the patch isn't eventually accepted.



But it turns out that patches are a lot like junk mail and puppies. They are greatly valued by those who produce them, but often viewed with great suspicion by the maintainers receiving them. You see, the thought of accepting a free patch also raises questions. Does the patch apply cleanly? Does it build without errors and warnings? Does it run at all? Does it pass regression tests? Has it been tested with the commonly used combination of configuration parameters? Does the patch have good code style? Is the patch maintainable? Does the patch provide a straightforward and robust solution to whatever problem it is trying to solve? In short, will this patch allow the maintainer to sleep through the night?



I am extremely fortunate in that most of the RCU patches that I receive are “good puppies.” However, not everyone is so lucky, and I occasionally hear from patch submitters whose patches were not well received. They often have a long list of reasons why their patches should have been accepted, including:



  1. I put a lot of work into that patch, so it should have been accepted! Unfortunately, hard work on your part does not guarantee a perception of value on the maintainer's part.

  2. The maintainer's job is to accept patches. Maybe not, your maintainer might well be an unpaid volunteer.

  3. But my maintainer is paid to maintain! True, but he is probably not being paid to do your job.

  4. I am not asking him to do my job, but rather his/her job, which is to accept patches! The maintainer's job is not to accept any and all patches, but instead to accept good patches that further the project's mission.

  5. I really don't like your attitude! I put a lot of work into making this be a very good patch! It should have been accepted! Really? Did you make sure it applied cleanly? Did you follow the project's coding conventions? Did you make sure that it passed regression tests? Did you test it on the full set of platforms supported by the project? Does it avoid problems discussed on the project's mailing list? Did you promptly update your patch based on any feedback you might have received? Is your code maintainable? Is your code aligned with the project's development directions? Do you have a good reputation with the community? Do you have a track record of supporting your submissions? In other words, will your patch allow the maintainer to sleep through the night?

  6. But I don't have time to do all that! Then the maintainer doesn't have time to accept your patch. And most especially doesn't have time to deal with all the problems that your patch is likely to cause.



As a recovering proprietary programmer, I can assure you that things work a bit differently in the open-source world, so some adjustment is required. But participation in an open-source project can be very rewarding and worthwhile!

February 06, 2015

Announcing rkt and App Container v0.3.1

Today we're announcing the next release of Rocket and the App Container (appc) spec, v0.3.1.

rkt Updates

This release of rkt includes new user-facing features and some important changes under the hood which further make progress towards our goals of security and composability.

First, the rkt CLI has a couple of new commands:

  • rkt trust can be used to easily add keys to the public keystore for ACI signatures (introduced in the previous release). This supports retrieving public keys directly from a URL or using discovery to locate public keys - a simple example of the latter is rkt trust --prefix coreos.com/etcd. See the commit for other examples.

  • rkt list is a simple tool to list the containers on the system. It leverages the same file-based locking as rkt status and rkt gc to ensure safety during concurrent invocations of rkt.

As mentioned, v0.3.1 includes two significant changes to how rkt is built internally.

  • Instead of embedding the (default) stage1 using go-bindata, rkt now consumes a stage1 in the form of an actual ACI, containing a rootfs and stage1 init/enter binaries, via the --stage1-image flag. This makes it much more straightforward to use alternative stage1 image with rkt and facilitates packaging for other distributions like Fedora.

  • rkt now vendors a copy of appc/spec instead of depending on HEAD. This means that rkt can be built in a self-contained and reproducible way and that master will no longer break in response to changes to the spec. It also makes explicit the specific version of the spec against which particular release of rkt is compiled.

As a consequence of these two changes, it is now possible to use the standard Go workflow to build the rkt CLI (e.g. go get github.com/coreos/rocket/rkt). Note however that this does not implicitly build a stage1, so that will still need to be done using the included ./build script, or some other way for those desiring to use a different stage1.

App Container Updates

This week saw a number of interesting projects emerge that implement the App Container Spec. Please note, all of these are very early and actively seeking more contributors.

Nose Cone, an independent App Container Runtime

Nose Cone is an appc runtime that is built on top of the libappc C++ library that was released a few weeks ago. This project is only a few days old but you can find it up on GitHub. It makes no use of rkt, but implements the App Container specification. It is great to see this level of experimentation around the appc spec: having multiple, alternative runtimes with different goals is an important part of building a robust specification.

Tools for building ACIs

A few tools have emerged since last week for building App Container Images. All of these are very early and could use your contributions to help get them production ready.

docker2aci

A Dockerfile and the "docker build" command is a very convenient way to build an image, and many people already have existing infrastructure and pipelines around Docker images. To take advantage of this, the docker2aci tool and library takes an existing Docker image and generates an equivalent ACI. This means the container can now be run in any implementation of the appc spec.

$ docker2aci quay.io/lafolle/redis
Downloading layer: 511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158
...
Generated ACI(s):
lafolle-redis-latest.aci
$ rkt run lafolle-redis-latest.aci
[3] 04 Feb 03:56:31.186 # Server started, Redis version 2.8.8

goaci

While a Dockerfile is a very convenient way to build, it should not be the only way to create a container image. With the new experimental goaci tool, it is possible to build a minimal golang ACI without the need of any additional build environment. Example:

$ goaci github.com/coreos/etcd
Wrote etcd.aci
$ actool -debug validate etcd.aci
etcd.aci: valid app container image

Quay support

Finally, we have added experimental support for App Container Images to Quay.io, our hosted container registry. Test it out by pulling any public image using rkt:

$ rkt trust --prefix quay.io
Prefix: "quay.io"
Key: "https://quay.io/aci-signing-key"
GPG key fingerprint is: BFF3 13CD AA56 0B16 A898  7B8F 72AB F5F6 799D 33BC
    Quay.io ACI Converter (ACI conversion signing key) <support@quay.io>
Are you sure you want to trust this key (yes/no)? yes
$ rkt run quay.io/philips/golang-outyet
$ curl 127.0.0.1:8080

While these tools are very young, they are an important milestone towards our goals with appc. We are on a path to being able to create images with multiple, independent tools (from Docker conversion to native language tools), and have multiple ways to run them (with runtimes like rkt and Nose Cone). This is just the beginning, but a great early example of the power of open standards.

Join us on a mission to create a secure, composable, and standards-based container runtime. If you are interested in hacking on rkt or App Container we encourage you to get involved:

There is still much to do - onward!

February 05, 2015

Improving Coastal Resilience throug Multi-agency Situational Awareness

Under a well-developed disaster management system, the Disaster Management Organization of a Country should be aware of and should map every significant emergency incident or risk in the country. Disseminating such information among multiple agencies with disparate systems can be [Read the Rest...]

February 03, 2015

Upcoming CoreOS Events in February

We’ve just come from FOSDEM ‘15 in Belgium and have an exciting rest of the month planned. We’ll be in Europe and the United States in February, and you can even catch Alex Polvi, CEO of CoreOS, keynoting at two events – TurboFest West (February 13) and Linux Collab Summit (February 18). Read more to see where we’ll be and meet us.

Also, thank you to all that hosted and attended our events last month. Questions or comments, contact us at press@coreos.com or tweet to us @CoreOSlinux.

Europe

See slides from the Config Management Camp 2015 talk by Kelsey Hightower (@kelseyhightower), developer advocate at CoreOS. He presented in Belgium on February 2 about Managing Containers at Scale with CoreOS and Kubernetes.


Tuesday, February 3 at 7 p.m. CET – Munich, Germany

Learn about CoreOS and Rocket at the Munich CoreOS meetup led by Brian Harrington/Redbeard (@brianredbeard), principal architect at CoreOS, and Jonathan Boulle (@baronboulle), senior engineer at CoreOS.


Tuesday, February 3 at 7 p.m. GMT – London, United Kingdom

See the first Kubernetes London meetup with Craig Box, solutions engineer for Google Cloud Platform, and Kelsey Hightower (@kelseyhightower), developer advocate at CoreOS. Attendees will be guided through the first steps with Kubernetes and Kelsey will discuss managing containers at scale with CoreOS and Kubernetes.


Thursday, February 5 at 7:00 p.m. CET – Frankfurt, Germany

Check out the DevOps Frankfurt meetup, where we will give a rundown on CoreOS and Rocket from Redbeard (@brianredbeard), principal architect at CoreOS, and Jonathan Boulle (@baronboulle), senior engineer at CoreOS.


Monday, February 9 at 7:00 p.m. CET – Berlin, Germany

Meet Jonathan Boulle (@baronboulle), senior engineer at CoreOS, at the CoreOS Berlin meetup to learn about Rocket and the App Container spec.

United States

Wednesday, February 4 at 6:00 p.m. – New York, New York

Come to our February CoreOS New York City meetup at Work-Bench, 110 Fifth Avenue on the 5th floor, where our team will discuss our new container runtime, Rocket, as well as Quay.io new features. In addition, Nathan Smith, head of engineering at Wink, www.wink.com, will walk us through how they are using CoreOS.


Monday, February 9 at 6:30 p.m. EST – New York, New York

The CTO School meetup will host an evening on Docker and the Linux container ecosystem. See Jake Moshenko (@JacobMoshenko), product manager at CoreOS, and Borja Burgos-Galindo, CEO & co-founder of Tutum, for an intro to containers and an overview on the ecosystem, followed by a presentation from Tom Leach and Travis Thieman of Gamechanger.


Friday, February 13 – San Francisco, California

See Alex Polvi, CEO of CoreOS, keynote at TurboFest West, a program of cloud and virtualization thought leadership discussions hosted by VMTurbo. Register for more details.


Tuesday, February 17 at 5:30 p.m. CST – Kansas City, Missouri

Redbeard (@brianredbeard), principal architect at CoreOS, will be kickin’ it with the Cloud KC meetup to go over CoreOS and Rocket. Thanks to C2FO for hosting this event.


Wednesday, February 18 at 10:00 a.m. PST – Santa Rosa, California

Alex Polvi, CEO of CoreOS, will present a keynote on Containers and the Changing Server Landscape at the Linux Collab Summit. See more about what Alex will discuss in a Q&A with Linux.com and tweet to us to meet at the event if you’ll be there.


Thursday, February 19 at 7:00 p.m. CST – Carrollton, Texas

Come to the Linux Containers & Virtualization meetup to meet Redbeard (@brianredbeard), principal architect at CoreOS, and learn about Rocket and the App Container spec.


February 19-February 22 – Los Angeles, California

Meet Jonathan Boulle (@baronboulle), senior engineer at CoreOS, at the SCALE 13x, the SoCal Linux Expo. Jon will present a session on Rocket and the App Container spec on Saturday, February 21 at 3:00 p.m. PT in the Carmel room.


More events will be added, so check back for updates here and at our community page!

In case you missed it, watch a webinar with Kelsey Hightower, developer advocate at CoreOS, and Matt Williams, DevOps evangelist at Datadog on Managing CoreOS Container Performance for Production Workloads

February 01, 2015

Parallel Programming: January 2015 Update

This release of Is Parallel Programming Hard, And, If So, What Can You Do About It? features a new chapter on SMP real-time programming, a updated formal-verification chapter, removal of a couple of the less-relevant appendices, several new cartoons (along with some refurbishing of old cartoons), and other updates, including contributions from Bill Pemberton, Borislav Petkov, Chris Rorvick, Namhyung Kim, Patrick Marlier, and Zygmunt Bazyli Krynicki.



As always, git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git will be updated in real time.

January 29, 2015

Verification Challenge 2: RCU NO_HZ_FULL_SYSIDLE

I suppose that I might as well get the “But what about Verification Challenge 1?” question out of the way to start with. You will find Verification Challenge 1 here.



Now, Verification Challenge 1 was about locating a known bug. Verification Challenge 2 takes a different approach: The goal is instead to find a bug if there is one, or to prove that there is no bug. This challenge involves the Linux kernel's NO_HZ_FULL_SYSIDLE functionality, which is supposed to determine whether or not all non-housekeeping CPUs are idle. The normal Linux-kernel review process located an unexpected bug (which was allegedly fixed), so it seemed worthwhile to apply some formal verification. Unfortunately, all of the tools that I tried failed. Not simply failed to verify, but failed to run correctly at all—though I have heard a rumor that one of the tools was fixed, and thus advanced to the “failed to verify” state, where “failed to verify” apparently meant that the tool consumed all available CPU and memory without deigning to express an opinion as to the correctness of the code.



So I fell back to 20-year-old habits and converted my C code to a Promela model and used spin to do a full-state-space verification. After some back and forth, this model did claim verification, and correctly refused to verify bug-injected perturbations of the model. Mathieu Desnoyers created a separate Promela model that made more deft use of temporal logic, and this model also claimed verification and refused to verify bug-injected perturbations. So maybe I can trust them. Or maybe not.



Unfortunately, regardless of whether or not I can trust these models, they are not appropriate for regression testing. How large a change to the C code requires a corresponding change to the Promela models? And how can I be sure that this corresponding change was actually the correct change? And so on: These kinds of questions just keep coming.



It would therefore be nice to be able to validate straight from the C code. So if you have a favorite verification tool, why not see what it can make of NO_HZ_FULL_SYSIDLE? The relevant fragments of the C code, along with both Promela models, can be found here. See the README file for a description of the files, and you know where to find me for any questions that you might have.



If you do give it a try, please let me know how it goes!

January 28, 2015

etcd 2.0 Release - First Major Stable Release

etcd 2.0 Release - First Major Stable Release

Today etcd hit v2.0.0, our first major stable release. Since the release-candidate, in mid-December, the team has been hard at work stabilizing the release. You can find the new binaries on GitHub.

For a quick overview, etcd is an open source, distributed, consistent key-value store for shared configuration, service discovery, and scheduler coordination. By using etcd, applications can ensure that even in the face of individual servers failing, the application will continue to work. etcd is a core component of CoreOS software that facilitates safe automatic updates, coordinating work being scheduled to hosts, and setting up overlay networking for containers.

New Updates

The etcd team has been hard at work to improve the ease-of-use and stability of the project. Some of the highlights compared to the last official release, etcd 0.4.6, include

  • Internal etcd protocol improvements to guard against accidental misconfiguration
  • etcdctl backup was added to make recovering from cluster failure easier
  • etcdctl member list/add/remove commands for easily managing a cluster
  • On-disk datastore safety improvements with CRC checksums and append-only behavior
  • An improved Raft consensus implementation already used in other projects like CockroachDB
  • More rigorous and faster running tests of the underlying Raft implementation, covering all state machine and cases explained in the original Raft white paper in 1.5 seconds
  • Additional administrator focused documentation explaining common scenarios
  • Official IANA assigned ports for etcd TCP 2379/2380

The major goal has been to make etcd more usable and stable with all of these changes. Over the hundreds of pull requests merged to make this release, many other improvements and bug fixes have been made. Thank you to the 150 contributors who have helped etcd get where it is today and provided those bug fixes, pull requests and more.

Who uses etcd?

Many projects use etcd - Google’s Kubernetes, Pivotal’s Cloud Foundry, Mailgun and now Apache Mesos and Mesosphere DCOS too. In addition to these projects, there are more than 500 projects on GitHub, using etcd. The feedback from these application developers continues to be an important part of the development cycle; thank you for being involved.

Direct quotes from people using etcd:

"We evaluated a number of persistent stores, yet etcd’s HTTP API and strong Go client support was the best fit for Cloud Foundry," said Onsi Fakhouri, engineering manager at Pivotal. "Anyone currently running a recent version of Cloud Foundry is running etcd. We are big fans of etcd and are excited to see the rapid progress behind the key-value store."

"etcd is an important part of configuration management and service discovery in our infrastructure," said Sasha Klizhentas, lead engineer at Mailgun. "Our services use etcd for dynamic load-balancing, leader election and canary deployment patterns. etcd’s simple HTTP API helps make our infrastructure reliable and distributed."

"Shared configuration and shared state are two very tricky domains for distributed systems developers as services no longer run on one machine but are coordinated across an entire datacenter," said Benjamin Hindman, chief architect at Mesosphere and chair of Apache Mesos. "Apache Mesos and Mesosphere’s Datacenter Operating System (DCOS) will soon have a standard plugin to support etcd. Users and customers have asked for etcd support, and we’re delivering it as an option."

Get Involved and Get Started

After nearly two years of diligent work, we are eager to hear your continued feedback on etcd. We will continue to work to make etcd a fundamental building block for Google-like infrastructure that users can take off the shelf, build upon and rely on.

Brandon Philips speaking about etcd 2.0

CoreOS CTO Brandon Philips speaking about etcd 2.0 at the CoreOS San Francsico meet up:

Update on CVE-2015-0235, GHOST

The glibc vulnerability, CVE-2015-0235, known as “GHOST”, has been patched on CoreOS. If automatic updates are enabled (default configuration), your server should already be patched.

If automatic updates are disabled, you can force an update by running update_engine_client -check_for_update.

Currently, the auto-update mechanism only applies to the base CoreOS, not inside your containers. If your container was built from an older ubuntu base, for example, you’ll need to update the container and get the patch from ubuntu.

If you have any questions or concerns, please join us in IRC freenode/#coreos.

SE Linux Play Machine Over Tor

I work on SE Linux to improve security for all computer users. I think that my work has gone reasonably well in that regard in terms of directly improving security of computers and helping developers find and fix certain types of security flaws in apps. But a large part of the security problems we have at the moment are related to subversion of Internet infrastructure. The Tor project is a significant step towards addressing such problems. So to achieve my goals in improving computer security I have to support the Tor project. So I decided to put my latest SE Linux Play Machine online as a Tor hidden service. There is no real need for it to be hidden (for the record it’s in my bedroom), but it’s a learning experience for me and for everyone who logs in.

A Play Machine is what I call a system with root as the guest account with only SE Linux to restrict access.

Running a Hidden Service

A Hidden Service in TOR is just a cryptographically protected address that forwards to a regular TCP port. It’s not difficult to setup and the Tor project has good documentation [1]. For Debian the file to edit is /etc/tor/torrc.

I added the following 3 lines to my torrc to create a hidden service for SSH. I forwarded port 80 for test purposes because web browsers are easier to configure for SOCKS proxying than ssh.

HiddenServiceDir /var/lib/tor/hidden_service/

HiddenServicePort 22 192.168.0.2:22

HiddenServicePort 80 192.168.0.2:22

Generally when setting up a hidden service you want to avoid using an IP address that gives anything away. So it’s a good idea to run a hidden service on a virtual machine that is well isolated from any public network. My Play machine is hidden in that manner not for secrecy but to prevent it being used for attacking other systems.

SSH over Tor

Howtoforge has a good article on setting up SSH with Tor [2]. That has everything you need for setting up Tor for a regular ssh connection, but the tor-resolve program only works for connecting to services on the public Internet. By design the .onion addresses used by Hidden Services have no mapping to anything that reswemble IP addresses and tor-resolve breaks it. I believe that the fact that tor-resolve breaks thins in this situation is a bug, I have filed Debian bug report #776454 requesting that tor-resolve allow such things to just work [3].

Host *.onion

ProxyCommand connect -5 -S localhost:9050 %h %p

I use the above ssh configuration (which can go in ~/.ssh/config or /etc/ssh/ssh_config) to tell the ssh client how to deal with .onion addresses. I also had to install the connect-proxy package which provides the connect program.

ssh root@zp7zwyd5t3aju57m.onion

The authenticity of host ‘zp7zwyd5t3aju57m.onion ()

ECDSA key fingerprint is 3c:17:2f:7b:e2:f6:c0:c2:66:f5:c9:ab:4e:02:45:74.

Are you sure you want to continue connecting (yes/no)?

I now get the above message when I connect, the ssh developers have dealt with connecting via a proxy that doesn’t have an IP address.

Also see the general information page about my Play Machine, that information page has the root password [4].

January 23, 2015

rkt and App Container 0.2.0 Release

This week both rkt and the App Container (appc) spec have reached 0.2.0. Since our launch of the projects in December, both have been moving very quickly with a healthy community emerging. rkt now has cryptographic signing by default and a community is emerging around independent implementations of the appc spec. Read on for details on the updates.

rkt 0.2.0

Development on rkt has continued rapidly over the past few weeks, and today we are releasing v0.2.0. This important milestone release brings a lot of new features and improvements that enable securely verified image retrieval and tools for container introspection and lifecycle management.

Notably, this release introduces several important new subcommands:

  • rkt enter, to enter the namespace of an app within a container
  • rkt status, to check the status of a container and applications within it
  • rkt gc, to garbage collect old containers no longer in use

In keeping with rkt's goals of being simple and composable, we've taken care to implement these lifecycle-related subcommands without introducing additional daemons or databases. rkt achieves this by taking advantage of existing file-system and kernel semantics like advisory file-locking, atomic renames, and implicit closing (and unlocking) of open files at process exit.

v0.2.0 also marks the arrival of automatic signature validation: when retrieving an image during rkt fetch or rkt run, Rocket will verify its signature by default. Kelsey Hightower has written up an overview guide explaining this functionality. This signature verification is backed by a flexible system for storing public keys, which will soon be even easier to use with a new rkt trust subcommand. This is a small but important step towards our goal of rkt being as secure as possible by default.

Here's an example of the key validation in action when retrieving the latest etcd release (in this case the CoreOS ACI signing key has previously been trusted using the process above):

$ rkt fetch coreos.com/etcd:v2.0.0-rc.1
rkt: searching for app image coreos.com/etcd:v2.0.0-rc.1
rkt: fetching image from https://github.com/coreos/etcd/releases/download/v2.0.0-rc.1/etcd-v2.0.0-rc.1-linux-amd64.aci
Downloading aci: [=============================                ] 2.31 MB/3.58 MB
Downloading signature from https://github.com/coreos/etcd/releases/download/v2.0.0-rc.1/etcd-v2.0.0-rc.1-linux-amd64.sig
rkt: signature verified: 
  CoreOS ACI Builder <release@coreos.com>

App Container 0.2.0

The appc spec continues to evolve but is now stabilizing. Some of the major changes are highlighted in the announcement email that went out earlier this week.

This last week has also seen the emergence of two different implementations of the spec: jetpack (a FreeBSD/Jails-based executor) and libappc (a C++ library for working with app containers). The authors of both projects have provided extremely helpful feedback and pull requests to the spec, and it is great to see these early implementations develop!

Jetpack, App Container for FreeBSD

Jetpack is an implementation of the App Container Specification for FreeBSD. It uses jails as an isolation mechanism, and ZFS for layered storage. Jetpack is a great test of the cross platform portability of appc.

libappc, C++ library for App Container

libappc is a C++ library for doing things with app containers. The goal of the library is to be a flexible toolkit: manifest parsing and creation, pluggable discovery, image creation/extraction/caching, thin-provisioned file systems, etc.

Get involved

If you are interested in contributing to any of these projects, please get involved! A great place to start is issues in the Help Wanted label on GitHub. You can also reach out with questions and feedback on the Rocket and appc mailing lists:

rkt

App Container

In the SF Bay Area or NYC next week? Come to the meetups in each area to hear more about these changes and the future of rocket and appc. RSVP to the CoreOS NYC meetup and SF meetup to learn more.

Lastly, thank you to the community of contributors emerging around Rocket and App Container:

Alan LaMielle, Alban Crequy, Alex Polvi, Ankush Agarwal, Antoine Roy-Gobeil, azu, beadon, Brandon Philips, Brian Ketelsen, Brian Waldon, Burcu Dogan, Caleb Spare, Charles Aylward, Daniel Farrell, Dan Lipsitt, deepak1556, Derek, Emil Hessman, Eugene Yakubovich, Filippo Giunchedi, Ghislain Guiot, gprggr, Hector Fernandez, Iago López Galeiras, James Bayer, Jimmy Zelinskie, Johan Bergström, Jonathan Boulle, Josh Braegger, Kelsey Hightower, Keunwoo Lee, Krzesimir Nowak, Levi Gross, Maciej Pasternacki, Mark Kropf, Mark Lamourine, Matt Blair, Matt Boersma, Máximo Cuadros Ortiz, Meaglith Ma, PatrickJS, Pekka Enberg, Peter Bourgon, Rahul, Robo, Rob Szumski, Rohit Jnagal, sbevington, Shaun Jackman, Simone Gotti, Simon Thulbourn, virtualswede, Vito Caputo, Vivek Sekhar, Xiang Li

January 20, 2015

Meet us for our January 2015 events

CoreOS CTO Brandon Philips speaking at Linux Conf AU

January has been packed with meetups and events across the globe. So far, we’ve been to India, Switzerland, France, England and New Zealand.

Check out a CoreOS tutorial from Brandon Philips (@brandonphilips) at Linux Conf New Zealand.

Our team has been on a fantastic tour meeting CoreOS contributors and friends around the world. A special thank you to the organizers of those meetups and to all those who came out to the meetups and made us feel at home. Come join us at the following events this month:

Tuesday, January 27 at 11 a.m. PST – Online

Join us for a webinar on Managing CoreOS Container Performance for Production Workloads. Kelsey Hightower (@kelseyhightower) from CoreOS and Matt Williams from Datadog will discuss trends in container usage and show how container performance can be monitored, especially as the container deployments grow. Register here.


Tuesday, January 27 at 6 p.m. EST – New York, NY

Come to our January New York City meetup at Work-Bench, 110 Fifth Avenue on the 5th floor, where our team will discuss our new container runtime, Rocket, as well as Quay.io new features. In addition, Nathan Smith, head of engineering at Wink, www.wink.com, will walk us through how they are using CoreOS. Register here.


Tuesday, January 27 at 6 p.m. PST – San Francisco, CA

Our January San Francisco meetup is not-to-miss! We’ll discuss news and updates on etcd, Rocket and AppC. Register here.


Thursday, January 29 at 7 p.m. CET – Barcelona, Spain

Meet Brian Harrington, better known as Redbeard (@brianredbeard), for CoreOS: An Overview, at itnig. Dedicated VMs and configuration management tools are being replaced by containerization and new service management technologies like systemd. This meetup will give an overview of CoreOS, including etcd, schedulers (mesos, kubernetes, etc.), and containers (nspawn, docker, rocket). Understand how to use these new technologies to build performant, reliable, large distributed systems. Register here.


Saturday, January 31-Sunday, February 1 – Brussels, Belgium

Our team is attending FOSDEM ’15 to connect with developers and the open source community. See our talks and meet the team at our dev booth throughout the event.

  • Redbeard (@brianredbeard) will discuss How CoreOS is built, modified, and updated on Saturday at 1 p.m. CET.
  • Jon Boulle (@baronboulle) from our engineering team will discuss all things Go at CoreOS on Sunday at 9:05 a.m. CET.
  • Kelsey Hightower (@kelseyhightower), developer advocate at CoreOS, will give a talk on Rocket and the App Container Spec at 11:40 a.m. CET.

A special shout out to the organizers of those meetups - Fintan Ryan, Ranganathan Balashanmugam, Muharem Hrnjadovic, Frédéric Ménez, Richard Paul, ­Piotr Zurek, Patrick Heneise, Benjamin Reitzammer, Sunday Ogwu, Tom Martin, Chris Kuhl and Johann Romefort.

If you are interested in hosting an event of your own or inviting someone from CoreOS to speak, reach out to us at press@coreos.com.

Sahana @ linux.conf.au

Last week I was able to attend linux.conf.au which was being hosted in my home town of Auckland. This was a great chance to spend time with people from the open source community from New Zealand, Australia and around the [Read the Rest...]

‘Sup With The Tablet?

As I mentioned on Twitter last week, I’m very happy SUSE was able to support linux.conf.au 2015 with a keynote giveaway on Wednesday morning and sponsorship of the post-conference Beer O’Clock at Catalyst:

For those who were in attendance, I thought a little explanation of the keynote gift (a Samsung Galaxy Tab 4 8″) might be in order, especially given the winner came up to me during the post-conference drinks and asked “what’s up with the tablet?”

To put this in perspective, I’m in engineering at SUSE (I’ve spent a lot of time working on high availabilitydistributed storage and cloud software), and while it’s fair to say I represent the company in some sense simply by existing, I do not (and cannot) actually speak on behalf of my employer. Nevertheless, it fell to me to purchase a gift for us to provide to one lucky delegate sensible enough to arrive on time for Wednesday’s keynote.

I like to think we have a distinct engineering culture at SUSE. In particular, we run a hackweek once or twice a year where everyone has a full week to work on something entirely of their own choosing, provided it’s related to Free and Open Source Software. In that spirit (and given that we don’t make hardware ourselves) I thought it would be nice to be able to donate an Android tablet which the winner would either be able to hack on directly, or would be able to use in the course of hacking something else. So I’m not aware of any particular relationship between my employer and that tablet, but as it says on the back of the hackweek t-shirt I was wearing at the time:

Some things have to be done just because they are possible.

Not because they make sense.