Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
We need to implement a partitioner which assigns a single partition key to each element.
This partitioner is needed because every single partition received by RelayTransform is a (compressed) partition in our large shuffle optimization.
1) If we make every element to single partition again through the proposed partitioner, we can turn off the compression before and after the RelayTransform. If we do not divide the elements like this but just use IntactPartitioner like now and turn off the compression, the decompression phase in the output edge from the vertex having RelayTransform will not recognize the boundary of compression properly. (Many compression algorithms like LZ4 do not properly
decompress the attached compressed bytes at once.)
2) If we use the suggested partitioner, we can flush the output data to disk per every element.
Attachments
Issue Links
- Is contained by
-
NEMO-144 Improve Data Plane Code
- Open
- links to