The full test suite is exceeding the 9 minute mark (581 seconds on my machine), this epic is to track techniques to improve this:
- Now that the master and the slave have to perform sync'ed disk writes, consider using tmpfs (e.g. under /dev/shm) to speed up the disk writes. For the master, we could also consider defaulting to in-memory state rather than the replicated log for most tests.
The reaper takes a full second to reap an exited process ( MESOS-1199), this adds a second to each slave recovery test, and possibly more for things that rely on Subprocess.
- The command executor sleeps for a second when shutting down (MESOS-442), this adds a second to every test that uses the command executor.
A big improvement will come from running the tests in parallel, a few options:
- Use automake's parallel test harness to compile tests separately and run tests in parallel (see here).
- Continue to use one test binary, but leverage google test's ability to shard tests across processes/machines (see here). This entails writing our own test wrapper script in support to decide many workers to use, etc. gtest-parallel is an example of a parallel runner, but does not leverage the sharding ability.