Parallel testing with Behat, Docker and Gearman

As a project grows in complexity, the amount of time it's supporting test suites take to run increases. In recent projects I've found this to be particularly noticeable with integration tests, such as those written in PHP + Behat.

Any xUnit test suite runs in series, one test runs at a time. By moving from series to parallel, it's easy to see that the overall testing time will decrease.

It's worth mentioning at this point that running your suite in parallel may not always be the most appropriate solution. When it comes to integration suites particularly, a critical review and removal/consolidation of scenarios can be a worthwhile activity.

It does seem however, that parallel execution is somewhat of a silver bullet for decreasing test suite times.

Parallel testing and Behat

In Behat, parallelism can be achieved using the Behat Gearman-Extension (by VPSoft). This extension uses the Gearman job server to implement the following parrallel workflow:

  1. Multiple workers run on a host waiting to be fed work by Gearman
  2. A single client feeds feature files to Gearman which in turn dishes them out to the workers, in parallel
  3. As soon as a worker finishes,it is fed another feature file by Gearman

The more workers you have, the more parallelism you get.

What the extension doesn't do, is help you out with collisions that occur as a result of parallel execution. It's up to you to isolate your application's persistant and runtime data.

It's relatively straight forward to avoid database collisions (using one database per worker, for example), but it defintely adds complexity to your application that just shouldn't be there. Plus, other collisions may be more difficult to address.

I'd avoided trying to work through this complexity until I saw Docker.

Docker

Docker is pretty cool. It's a combination of a few hairy- chested linux technologies through which you can create isolated runtime environments called containers. Soon after hearing about Docker I realised, if I could get the workers to run on individual Docker containers, my parrallel behat suite would be collision free!

Containers

I've mapped the gearman workflow to 4 containers:

  1. Application
  2. Gearman
  3. Worker
  4. Client

Application Container

Following the data-only pattern, this contains my built application code, including the src , vendor , feature directories. They are stored in a volume (in my case /var/www ) and because they are read-only files, can be shared safely across multiple containers.

To start this container I run:

docker run -d -name behat-docker.data behat-
 docker/data
  • docker run -d the container is run in daemon mode
  • -name behat-docker.data naming the container, so I
    can use it later
  • behat-docker/data the image (based on busybox) which contains my application code

Gearman Container

This container exposes a gearman server to my testing environment.
To start this container I run:

 docker run -d -name behat-docker.gearman
 rgarcia/gearmand
  • docker run -d the container is run in daemon mode
  • -name behat-docker.gearman naming the container, so
    I can use it later
  • rgarcia/gearmand using Rafael's gearman image, taken straight form the public Docker repository

Worker Container

This is my application's runtime environment containing the executables required to run my application, in this case php-cli. To connect to the gearman server the container also requires the php extension php5-gearman as well as TCP connectivity to make communication possible.

Enter Docker's service discovery mechanism, Links.

Links allows one container to expose itself (on a particular port) to any other container running on the Docker host. Since the image rgarcia/gearmand is already exposing port 4730 , all I need to do is setup a link to this container when I execute docker run ... and my worker will be able to connect.

To start my worker I run:

 docker run -d --link behat-docker.gearman:gm  -volumes-from behat-docker.data -name behat-docker.worker-1 behat-docker/exec worker.sh
  • docker run -d runs my container in deamon mode
  • --link behat-docker.gearman:gm registers a link to the gearman container and makes it available to the container through the alias gm
  • -volumes-from behat-docker.data mounts the volume from the data container (to /var/www)
  • -name behat-docker.worker-1 naming the worker
  • behat-docker/exec the application runtime's image
  • worker.sh the command the worker will run

The script worker.sh requires a quick explanation.

#!/bin/sh

# configure behat with the gearman job server address
sed "s|GEARMAN_SERVER|$GM_PORT_4730_TCP_ADDR:$GM_PORT
_4730_TCP_PORT|g" behat-worker.yml.tpl > behat-worker.yml

#run the behat worker command
bin/behat --config behat-worker.yml

Ultimately, I need the worker to run the command bin/behat -config behat-worker.yml, a long running php
process which gets it's configuration, including the gearman containers address, from behat-worker.yml.

The Docker link can only make the networking configuration available at runtime. I use sed to perform a rudimentary find + replace on the behat config file, pulling in the appropriate environment variables:

  • GM_PORT_4730_TCP_ADDR - the external IP address of the gearman deamon
  • GM_PORT_4730_TCP_PORT - the external port of the gearman deamon

Note: Both variables make reference to the alias GM I defined in my link above.

Client Container

The client container is run in much the same way as the worker (using the same image):

docker run -t -i -rm --link behat-docker.gearman:gm  -volumes-from behat-docker.data -name behat-docker.client behat-docker/exec client.sh

The only differences are:

  • docker run -t -i -rm - the container is run in the foreground so I can see the tests running and easily grab the exit code
  • client.sh - runs the command bin/behat -config behat- client.yml, so the container will act as the client, not a worker

When client.sh is run, the gearman workflow will be kicked off:

  1. The client container will feed the individual feature files to the gearman container
  2. The gearman container will distribute these out to each worker container, in parallel
  3. When a worker container becomes free,it will be fed another feature to run
  4. The client container collates the results, prints the test results and exits

In order increase the level of parrallelism, I just just kick off more workers!

Dockman

Clearly, both the Gearman-Extension and Docker constitute the secret sauce for this approach. In my opinion, Docker (v0.8) contains the killer features (such as linked containers) that make it a viable option for development and CI environments.

I've wrapped up this workflow into a simple shell script called dock man (HT @alexedge for the name).

It uses a config file .dockman.yml to pull your project's docker images into the workflow. Just run ./dockman n to run your behat suite across n workers.

It's very early stages but you can find it on Github. I'm currently looking at moving the gearman implementation out of PHP into dockman so that other testing frameworks and languages can utilise the workflow.

Here's a taster of the results:

Running in series

Running x2 workers

Running x4 workers