[FLINK-4175] Broadcast data sent increases with # slots per TM - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Abandoned
Affects Version/s: 1.0.3
Fix Version/s: None
Component/s: Runtime / Network
Labels:
- performance
- stale-assigned

Description

Problem:
we experience some unexpected increase of data sent over the network for broadcasts with increasing number of slots per Taskmanager.

We provided a benchmark [1]. It not only increases the size of data sent over the network but also hurts performance as seen in the preliminary results below. In this results cloud-11 has 25 nodes and ibm-power has 8 nodes with scaling the number of slots per node from 1 - 16.

----------------------------------------------

suite

name

median_time

==================================================

broadcast.cloud-11	broadcast.01	8796
broadcast.cloud-11	broadcast.02	14802
broadcast.cloud-11	broadcast.04	30173
broadcast.cloud-11	broadcast.08	56936
broadcast.cloud-11	broadcast.16	117507
broadcast.ibm-power-1	broadcast.01	6807
broadcast.ibm-power-1	broadcast.02	8443
broadcast.ibm-power-1	broadcast.04	11823
broadcast.ibm-power-1	broadcast.08	21655
broadcast.ibm-power-1	broadcast.16	37426

----------------------------------------------

After looking into the code base it, it seems that the data is de-serialized only once per TM, but the actual data is sent for all slots running the operator with broadcast vars and just gets discarded in case its already de-serialized.

We do not see a reason the data can't be shared among the slots of a TM and therefore just sent once.

[1] https://github.com/TU-Berlin-DIMA/flink-broadcast

This Jira will continue the discussion started here: https://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3C1465386300767.94345@tu-berlin.de%3E

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Felix Neutatz

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 07/Jul/16 23:41

Updated:: 20/Apr/21 08:20

Resolved:: 20/Apr/21 08:20