Details
-
Improvement
-
Status: Resolved
-
P3
-
Resolution: Fixed
-
None
-
None
Description
The current implementation:
1. add fixed value 1: P<T> --> P<<T, 1>>
2. group by key: P<<T, 1>> --> GBK<T, 1>
3. drop the value: P<distinct T>
The new proposed implementation:
1. ditto
2. combine by key: P<<T, 1>> --> P<<distinct T, 1>>
3. ditto
CombinePerKey performs a pre-GBK ParDo, which is useful to reduce the shuffle size.
Attachments
Issue Links
- links to