Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
Force fetch inputs before starting outputs so that we can choose to allocate more space for buffers by setting tez.task.scale.memory.input-output-concurrent=false which is a new option in Tez. With the default value of true, WeightedScalingMemoryDistributor in Tez for a TezConfiguration.TEZ_TASK_SCALE_MEMORY_RESERVE_FRACTION of 0.5 and 1G memory, will split the 512MB between inputs and outputs. If set to false, it will allocate 512MB to inputs and 512MB to outputs. For eg: For two join inputs and one group by output
tez.task.scale.memory.input-output-concurrent=true
2016-03-28 01:15:58,842 [INFO] [TezChild] |resources.MemoryDistributor|: Allocations=[scope-32:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:268435456:83684722], [scope-30:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:193488239], [scope-29:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:193488239]
tez.task.scale.memory.input-output-concurrent=false
2016-03-28 01:25:36,665 [INFO] [TezChild] |resources.MemoryDistributor|: Allocations=[scope-32:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:268435456:268435456], [scope-29:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:235330600], [scope-30:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:235330600]
To ensure we don't hit OOM, we need to finish fetching the inputs by calling reader.next() before calling output.start(). That will make sure the input buffers are released before output buffers are allocated.
Attachments
Attachments
Issue Links
- is related to
-
TEZ-3244 Allow overlap of input and output memory when they are not concurrent
- Closed