[MESOS-1199] Subprocess is "slow" -> gated by process::reap poll interval - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.18.0
Fix Version/s: None
Component/s: None
Labels:
None

Sprint:
Mesos Q3 Sprint 6
Story Points:
1

Description

Subprocess uses process::reap to wait on the subprocess pid and set the exit status. However, process::reap polls with a one second interval resulting in a delay up to the interval duration before the status future is set.

This means if you need to wait for the subprocess to complete you get hit with E(delay) = 0.5 seconds, independent of the execution time. For example, the MesosContainerizer uses mesos-fetcher in a Subprocess to fetch the executor during launch. At Twitter we fetch a local file, i.e., a very fast operation, but the launch is blocked until the mesos-fetcher pid is reaped -> adding 0 to 1 seconds for every launch!

The problem is even worse with a chain of short Subprocesses because after the first Subprocess completes you'll be synchronized with the reap interval and you'll see nearly the full interval before notification, i.e., 10 Subprocesses each of << 1 second duration with take ~10 seconds!

This has become particularly apparent in some new tests I'm working on where test durations are now greatly extended with each taking several seconds.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

wiatpid.pdf
01/Aug/14 19:13
62 kB
Craig Hansen-Sturm

Issue Links

is related to

MESOS-1757 Speed up the tests

Accepted

Activity

People

Assignee:: Ian Downes

Reporter:: Ian Downes

Shepherd:: Ian Downes

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 08/Apr/14 18:20

Updated:: 31/Jul/15 00:58

Resolved:: 26/Sep/14 17:22