Race Condition Database Container

When running acceptance tests we would destroy the database container after each test so the next test would start from known point and thus wouldn’t be affected by any prior tests.

Every now and then our tests would fail because the test had started before the database was ready despite the database container running. There is a small period of time after the container starts but before the database can accept connections which was causing these failures. As the number of tests grew it started happening more often which started becoming quite painful checking why the nightly tests had failed each morning.

We solved the problem by doing the following:

Adding a health check to the database container (requires docker-compose version greater than 2.1):

healthcheck:
            test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
            interval: 5s
            timeout: 10s
            retries: 60

Then adding a separate container that won’t start until the database container is marked as healthy:

    db_fluffer:
        image: tianon/true
        depends_on:
            mysql_db:
                condition: service_healthy

Then instead of starting the database container via Jenkins we would launch db_fluffer before each test, because that waits until the database container is healthy it means our tests now are forced to wait until the database is ready.

There might be a nicer way of doing this but proved to be quite simple and solved the problem we had.