Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
Currently, the GROUPBY operator of PIG is mapped by Spark's CoGroup. When the grouped data is consumed by subsequent operations to perform algebraic operations, this is sub-optimal as there is lot of shuffle traffic.
The Spark Plan must be optimized to use reduceBy, where possible, so that a combiner is used.
Attachments
Attachments
Issue Links
- is required by
-
PIG-4766 Ensure GroupBy is optimized for all algebraic Operations
- Closed
- links to