Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
If users don't explicitly configure the various Inputs/Outputs that are part of a task (Vertex), there's a good chance that more than 1x of the available heap will be assumed to be available between all of the I/Os and Processor.
For now, we could scale resource asks from various I/Os - if they are misconfigured. Eventually, this either needs to be figured out by the Client, or needs to be done in a more dynamic manner - leaving that for TEZ-719.
Options, and I'm open to other suggestions (both assume runtime-internals is to some extent aware of the available IOs)
Have the Inputs / Outputs implement an interface which can be used to query for memory requirements, as well as set what will actually be available.
A naive approach would be to just scale the values - so that everything fits within available memory. This is fairly simple, and I'm in favor of getting started with this.
Alternately, the policy could consider additional aspects - more knowledge about the I/Os - to make slightly smarter decisions.
This reconfiguration would be done for all tasks - during the time they're initialized. Will likely require the I/Os to have another method to actually get them started. That's along the lines of what is required for TEZ-688.
This could be done within the AM as well - but effectively ends up requiring each of the payloads the be deserialzied, IOs initialized - and runtime-library code leaking into the DAG.
Another alternate is a helper on the client - which accepts a Vertex, and reconfigures the payloads - again, this makes assumptions on the payload format - and will have to serialize/deserialize.
Thoughts ?