Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-4175

Broadcast data sent increases with # slots per TM

    XMLWordPrintableJSON

Details

    Description

      Problem:
      we experience some unexpected increase of data sent over the network for broadcasts with increasing number of slots per Taskmanager.

      We provided a benchmark [1]. It not only increases the size of data sent over the network but also hurts performance as seen in the preliminary results below. In this results cloud-11 has 25 nodes and ibm-power has 8 nodes with scaling the number of slots per node from 1 - 16.

      ----------------------------------------------

      suite name median_time

      ==================================================

      broadcast.cloud-11 broadcast.01 8796
      broadcast.cloud-11 broadcast.02 14802
      broadcast.cloud-11 broadcast.04 30173
      broadcast.cloud-11 broadcast.08 56936
      broadcast.cloud-11 broadcast.16 117507
      broadcast.ibm-power-1 broadcast.01 6807
      broadcast.ibm-power-1 broadcast.02 8443
      broadcast.ibm-power-1 broadcast.04 11823
      broadcast.ibm-power-1 broadcast.08 21655
      broadcast.ibm-power-1 broadcast.16 37426

      ----------------------------------------------

      After looking into the code base it, it seems that the data is de-serialized only once per TM, but the actual data is sent for all slots running the operator with broadcast vars and just gets discarded in case its already de-serialized.

      We do not see a reason the data can't be shared among the slots of a TM and therefore just sent once.

      [1] https://github.com/TU-Berlin-DIMA/flink-broadcast

      This Jira will continue the discussion started here: https://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3C1465386300767.94345@tu-berlin.de%3E

      Attachments

        Activity

          People

            Unassigned Unassigned
            Felix Neutatz Felix Neutatz
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: