Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-636

PERFORMANCE: Use lightweight bag implementations which do not register with SpillableMemoryManager with Combiner



    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.2.0
    • Fix Version/s: 0.2.0
    • Component/s: None
    • Labels:


      Currently whenever Combiner is used in pig, in the map, the POPrecombinerLocalRearrange operator puts the single "value" tuple corresponding to a key into a DataBag and passes this to the foreach which is being combined. This will generate as many bags as there are input records. These bags all will have a single tuple and hence are small and should not need to be spilt to disk. However since the bags are created through the BagFactory mechanism, each bag creation is registered with the SpillableMemoryManager and a weak reference to the bag is stored in a linked list. This linked list grows really big over time causing unnecessary Garbage collection runs. This can be avoided by having a simple lightweight implementation of the DataBag interface to store the single tuple in a bag. Also these SingleTupleBags should be created without registering with the spillableMemoryManager. Likewise the bags created in POCombinePackage are supposed to fit in Memory and not spill. Again a NonSpillableDataBag implementation of DataBag interface which does not register with the SpillableMemoryManager would help.


        1. PIG-636-v2.patch
          21 kB
          Pradeep Kamath
        2. PIG-636.patch
          21 kB
          Pradeep Kamath

          Issue Links



              • Assignee:
                pkamath Pradeep Kamath
                pkamath Pradeep Kamath
              • Votes:
                0 Vote for this issue
                1 Start watching this issue


                • Created: