Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
None
Description
BatchGroupAlsoByWindowViaIteratorsFn WindowReiterator can cause very expensive duplicate reading of data from a (Co-)GroupByKey for hot keys with many values due to PCollection size estimation. Instead, it should perform lazy estimation like GroupingShuffleReader and GroupingShuffleEntryInterator (or perform no estimation at all).