Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.10.2
-
None
-
None
Description
We always put compressed size in ExternalSorter#partitionStats while we put uncompressed size in UnorderedPartitionedKVWriter#sizePerPartition. Those should have consistent semantics.
As far as I know, the uncompressed size is preferable because of some reasons.
- The stats are used in FairShuffleVertexManager to configure the parallelism. The normal ShuffleVertexManager which is broadly used computes parallelism based on uncompressed size. Otherwise, we need to tune `tez.fair-shuffle-vertex-manager.desired-task-input-size` based on compressed size though `tez.shuffle-vertex-manager.desired-task-input-size` must be based on decompressed size
- Ming pointed out we should use uncompressed size in
TEZ-3206. Looks like, we missed creating a follow-up ticket
Attachments
Issue Links
- links to