Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7574

Spark runner: Combine.perKey performance issues

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.13.0
    • Fix Version/s: 2.15.0
    • Component/s: runner-spark
    • Labels:
      None

      Description

      Combine.perKey on current implementation uses technique of creating an accumulator for each input key and then merge all these accumulators together. Optimize this by:

      • changing accumulator from Iterable to Map, and using addInput as much as possible
      • try to move the window explode to pre-shuffle (add window label to key for non-merging windows), measure the impact, and if the impact is substantial, implement that for at least window functions assigning to single (global) window or single window per element (tumbling windows)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                janl Jan Lukavsk√Ĺ
                Reporter:
                janl Jan Lukavsk√Ĺ
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10h
                  10h