Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
-
None
Description
(1) Currently, we always bring the entire bag into memory even though in most cases we just need to stream through it. This is very inefficient in terms of memory and CPU usage.
(2) If we are doing multiple computations on the same group, we iterate over the bag that represents the group several times. This is very inefficient especially for spilled bags.
Attachments
Issue Links
- depends upon
-
PIG-157 Add types and rework execution pipeline
- Closed