Pig
  1. Pig
  2. PIG-975

Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.4.0
    • Fix Version/s: 0.6.0
    • Component/s: None
    • Labels:
      None

      Description

      POPackage uses DefaultDataBag during reduce process to hold data. It is registered with SpillableMemoryManager and prone to OutOfMemoryException. It's better to pro-actively managers the usage of the memory. The bag fills in memory to a specified amount, and dump the rest the disk. The amount of memory to hold tuples is configurable. This can avoid out of memory error.

      1. PIG-975.patch
        10 kB
        Ying He
      2. PIG-975.patch2
        10 kB
        Ying He
      3. PIG-975.patch3
        10 kB
        Ying He
      4. internalbag.xls
        80 kB
        Ying He
      5. PIG-975.patch4
        11 kB
        Ying He

        Issue Links

          Activity

          Ying He created issue -
          Ying He made changes -
          Field Original Value New Value
          Link This issue is a clone of PIG-636 [ PIG-636 ]
          Ying He made changes -
          Description Currently whenever Combiner is used in pig, in the map, the POPrecombinerLocalRearrange operator puts the single "value" tuple corresponding to a key into a DataBag and passes this to the foreach which is being combined. This will generate as many bags as there are input records. These bags all will have a single tuple and hence are small and should not need to be spilt to disk. However since the bags are created through the BagFactory mechanism, each bag creation is registered with the SpillableMemoryManager and a weak reference to the bag is stored in a linked list. This linked list grows really big over time causing unnecessary Garbage collection runs. This can be avoided by having a simple lightweight implementation of the DataBag interface to store the single tuple in a bag. Also these SingleTupleBags should be created without registering with the spillableMemoryManager. Likewise the bags created in POCombinePackage are supposed to fit in Memory and not spill. Again a NonSpillableDataBag implementation of DataBag interface which does not register with the SpillableMemoryManager would help.
          POPackage uses DefaultDataBag during reduce process to hold data. It is registered with SpillableMemoryManager and prone to OutOfMemoryException. It's better to pro-actively managers the usage of the memory. The bag fills in memory to a specified amount, and dump the rest the disk. The amount of memory to hold tuples is configurable. This can avoid out of memory error.
          Ying He made changes -
          Attachment PIG-975.patch [ 12420482 ]
          Ying He made changes -
          Attachment PIG-975.patch2 [ 12420486 ]
          Olga Natkovich made changes -
          Assignee Pradeep Kamath [ pkamath ] Ying He [ yinghe ]
          Olga Natkovich made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Ying He made changes -
          Attachment PIG-975.patch3 [ 12420570 ]
          Ying He made changes -
          Attachment internalbag.xls [ 12420571 ]
          Ying He made changes -
          Attachment PIG-975.patch4 [ 12420603 ]
          Olga Natkovich made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Olga Natkovich made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Affects Version/s 0.4.0 [ 12314042 ]
          Affects Version/s 0.2.0 [ 12313783 ]
          Fix Version/s 0.6.0 [ 12314214 ]
          Fix Version/s 0.2.0 [ 12313783 ]
          Olga Natkovich made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Ying He made changes -
          Link This issue is cloned as PIG-1000 [ PIG-1000 ]
          Alan Gates made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Ying He
              Reporter:
              Ying He
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development