Right now Thermos is mostly tested via an integration test suite. We recently found a bug in Thermos's TaskRunner method collect_updates that should be trivially unit-testable but it's fairly challenging to mock everything in a sane way.
A few things that would be valuable:
- abstract out the checkpoint log – right now TaskRunnerHelper is a mishmash of checkpoint log logic and posix process handling logic.
- remove the task lifecycle logic from TaskRunnerHelper and put somewhere else
- invert the Process/Platform abstraction in order to make it easier to stub out Process (e.g. PosixProcess, ThreadProcess, TestProcess, DockerProcess.)
- move the pid handling logic from TaskRunnerHelper into PosixProcess