Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Memory shortage is one of the main performance issues in Pig. Knowing when we spill do the disk is useful for understanding query performance and also to see how certain changes in Pig effect that.
Other interesting stats to collect would be average CPU usage and max mem usage but I am not sure if this information is easily retrievable.
Using Hadoop counters for this would make sense.