Parallel testing with Behat, Docker and Gearman
As a project grows in complexity, the amount of time it's supporting test suites take to run increases. In recent projects I've found this to be particularly noticeable with integration tests, such as those written in PHP + Behat.
Any xUnit
test suite runs in series, one test runs at a time. By moving from series to parallel, it's easy to see that the overall testing time will decrease.
It's worth mentioning at this point that running your suite in parallel may not always be the most appropriate solution. When it comes to integration suites particularly, a critical review and removal/consolidation of scenarios can be a worthwhile activity.
It does seem however, that parallel execution is somewhat of a silver bullet for decreasing test suite times.
Parallel testing and Behat
In Behat, parallelism can be achieved using the Behat Gearman-Extension (by VPSoft). This extension uses the Gearman job server to implement the following parrallel workflow:
- Multiple workers run on a host waiting to be fed work by Gearman
- A single client feeds feature files to Gearman which in turn dishes them out to the workers, in parallel
- As soon as a worker finishes,it is fed another feature file by Gearman
The more workers you have, the more parallelism you get.
What the extension doesn't do, is help you out with collisions that occur as a result of parallel execution. It's up to you to isolate your application's persistant and runtime data.
It's relatively straight forward to avoid database collisions (using one database per worker, for example), but it defintely adds complexity to your application that just shouldn't be there. Plus, other collisions may be more difficult to address.
I'd avoided trying to work through this complexity until I saw Docker.
Docker
Docker is pretty cool. It's a combination of a few hairy- chested linux technologies through which you can create isolated runtime environments called containers. Soon after hearing about Docker I realised, if I could get the workers to run on individual Docker containers, my parrallel behat suite would be collision free!
Containers
I've mapped the gearman workflow to 4 containers:
- Application
- Gearman
- Worker
- Client
Application Container
Following the data-only pattern, this contains my built application code, including the src , vendor , feature directories. They are stored in a volume (in my case /var/www
) and because they are read-only files, can be shared safely across multiple containers.
To start this container I run:
docker run -d -name behat-docker.data behat-
docker/data
docker run -d
the container is run in daemon mode-name behat-docker.data
naming the container, so I
can use it laterbehat-docker/data
the image (based on busybox) which contains my application code
Gearman Container
This container exposes a gearman server to my testing environment.
To start this container I run:
 docker run -d -name behat-docker.gearman
rgarcia/gearmand
docker run -d
the container is run in daemon mode-name behat-docker.gearman
naming the container, so
I can use it laterrgarcia/gearmand
using Rafael's gearman image, taken straight form the public Docker repository
Worker Container
This is my application's runtime environment containing the executables required to run my application, in this case php-cli
. To connect to the gearman server the container also requires the php extension php5-gearman as well as TCP connectivity to make communication possible.
Enter Docker's service discovery mechanism, Links.
Links allows one container to expose itself (on a particular port) to any other container running on the Docker host. Since the image rgarcia/gearmand
is already exposing port 4730
, all I need to do is setup a link to this container when I execute docker run ...
and my worker will be able to connect.
To start my worker I run:
 docker run -d --link behat-docker.gearman:gm -volumes-from behat-docker.data -name behat-docker.worker-1 behat-docker/exec worker.sh
docker run -d
runs my container in deamon mode--link behat-docker.gearman:gm
registers a link to the gearman container and makes it available to the container through the aliasgm
-volumes-from behat-docker.data
mounts the volume from the data container (to/var/www
)-name behat-docker.worker-1
naming the workerbehat-docker/exec
the application runtime's imageworker.sh
the command the worker will run
The script worker.sh
requires a quick explanation.
#!/bin/sh
# configure behat with the gearman job server address
sed "s|GEARMAN_SERVER|$GM_PORT_4730_TCP_ADDR:$GM_PORT
_4730_TCP_PORT|g" behat-worker.yml.tpl > behat-worker.yml
#run the behat worker command
bin/behat --config behat-worker.yml
Ultimately, I need the worker to run the command bin/behat -config behat-worker.yml
, a long running php
process which gets it's configuration, including the gearman containers address, from behat-worker.yml
.
The Docker link can only make the networking configuration available at runtime. I use sed
to perform a rudimentary find + replace on the behat config file, pulling in the appropriate environment variables:
GM_PORT_4730_TCP_ADDR
- the external IP address of the gearman deamonGM_PORT_4730_TCP_PORT
- the external port of the gearman deamon
Note: Both variables make reference to the alias GM
I defined in my link above.
Client Container
The client container is run in much the same way as the worker (using the same image):
docker run -t -i -rm --link behat-docker.gearman:gm -volumes-from behat-docker.data -name behat-docker.client behat-docker/exec client.sh
The only differences are:
docker run -t -i -rm
- the container is run in the foreground so I can see the tests running and easily grab the exit codeclient.sh
- runs the commandbin/behat -config behat- client.yml
, so the container will act as the client, not a worker
When client.sh
is run, the gearman workflow will be kicked off:
- The client container will feed the individual feature files to the gearman container
- The gearman container will distribute these out to each worker container, in parallel
- When a worker container becomes free,it will be fed another feature to run
- The client container collates the results, prints the test results and exits
In order increase the level of parrallelism, I just just kick off more workers!
Dockman
Clearly, both the Gearman-Extension and Docker constitute the secret sauce for this approach. In my opinion, Docker (v0.8) contains the killer features (such as linked containers) that make it a viable option for development and CI environments.
I've wrapped up this workflow into a simple shell script called dock man
(HT @alexedge for the name).
It uses a config file .dockman.yml
to pull your project's docker images into the workflow. Just run ./dockman n
to run your behat suite across n workers.
It's very early stages but you can find it on Github. I'm currently looking at moving the gearman implementation out of PHP into dockman so that other testing frameworks and languages can utilise the workflow.
Here's a taster of the results:
Running in series
Running x2 workers
Running x4 workers