Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-1725

New Partitioner for better load balancing for skewed data

    XMLWordPrintableJSON

Details

    Description

      Hi,

      We have recently studied the problem of load balancing in Storm [1].
      In particular, we focused on key distribution of the stream for skewed data.
      We developed a new stream partitioning scheme (which we call Partial Key Grouping). It achieves better load balancing than key grouping while being more scalable than shuffle grouping in terms of memory.

      In the paper we show a number of mining algorithms that are easy to implement with partial key grouping, and whose performance can benefit from it. We think that it might also be useful for a larger class of algorithms.

      Partial key grouping is very easy to implement: it requires just a few lines of code in Java when implemented as a custom grouping in Storm [2].

      For all these reasons, we believe it will be a nice addition to the standard Partitioners available in Flink. If the community thinks it's a good idea, we will be happy to offer support in the porting.

      References:
      [1]. https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
      [2]. https://github.com/gdfm/partial-key-grouping

      Attachments

        Activity

          People

            aadi.anis Anis Nasir
            aadi.anis Anis Nasir
            Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 336h
                336h
                Remaining:
                Remaining Estimate - 335h 50m
                335h 50m
                Logged:
                Remaining Estimate - 335h 50m
                10m