Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-9877

Eager size estimation of large group-by-key iterables cause expensive / duplicate reads

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 2.21.0
    • sdk-java-core

    Description

      BatchGroupAlsoByWindowViaIteratorsFn WindowReiterator can cause very expensive duplicate reading of data from a (Co-)GroupByKey for hot keys with many values due to PCollection size estimation.  Instead, it should perform lazy estimation like GroupingShuffleReader and GroupingShuffleEntryInterator (or perform no estimation at all).

      Attachments

        Activity

          People

            tudorm Tudor Marian
            tudorm Tudor Marian
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h
                1h