Description
Revisiting TEZ-3145, and found that in addition to improving the way empty partitions are send from Maps to AM and AM to Reducers, message serialization can be improved to reduce network traffic.
For example in a job with 42000 Maps and 7500 reduces where 95% of the partition data produced is empty. Tez DME events send from the AM to the Reducers is num(Maps) * num(Reducers) * size (Wrapped DME). With 95% empty partitions message size is 450 bytes where 260 bytes is needed for sending empty partitions and 190 bytes for messaging. Total messaging is 132 GBs
76 GBs for empty partition data and 56 GBs for non-empty partition messaging. This jira aims to reduce the non-empty partition messaging.
Attachments
Attachments
Issue Links
- is related to
-
TEZ-3145 Reduce message size when empty partitions is high
- Open