[MESOS-1757] Speed up the tests - ASF JIRA

XML

Word

Printable

JSON

The full test suite is exceeding the 9 minute mark (581 seconds on my machine), this epic is to track techniques to improve this:

Now that the master and the slave have to perform sync'ed disk writes, consider using tmpfs (e.g. under /dev/shm) to speed up the disk writes. For the master, we could also consider defaulting to in-memory state rather than the replicated log for most tests.
~~The reaper takes a full second to reap an exited process (~~MESOS-1199~~), this adds a second to each slave recovery test, and possibly more for things that rely on Subprocess.~~
The command executor sleeps for a second when shutting down (MESOS-442), this adds a second to every test that uses the command executor.

A big improvement will come from running the tests in parallel, a few options:

Use automake's parallel test harness to compile tests separately and run tests in parallel (see here).
Continue to use one test binary, but leverage google test's ability to shard tests across processes/machines (see here). This entails writing our own test wrapper script in support to decide many workers to use, etc. gtest-parallel is an example of a parallel runner, but does not leverage the sharding ability.

is blocked by

MESOS-4156 Speed up FetcherCacheTest.* and FetcherCacheHttpTest.*

MESOS-4157 Speed up ZooKeeper-related tests

MESOS-4158 Speed up SlaveRecoveryTest.*

MESOS-4159 Speed up GroupTest.*

MESOS-4155 Speed up ExamplesTest.*

is duplicated by

MESOS-2059 improve performance of expensive tests

relates to

MESOS-3760 Remove fragile sleep() from ProcessManager::settle()

MESOS-1582 Improve build time.

MESOS-1199 Subprocess is "slow" -> gated by process::reap poll interval

(1 is duplicated by, 3 relates to)