1. Pig
  2. PIG-164

In scripts that create large groups pig runs out of memory


    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.0.0
    • Fix Version/s: 0.1.0
    • Component/s: impl
    • Labels:


      Scripts that need to group large amounts of data, such as a group all with 20m records, often die with errors indicating that no more memory can be allocated. PIG-40 addressed this somewhat, but not completely. In fact, it appears that in some situations it made it worse. If a script creates many data bags it can now run out of memory tracking all those data bags that it may need to spill even if none of those bags gets very large.

      The issue is that the fix to PIG-40 introduced a memory manager that has a LinkedList of WeakReferences that it uses to track these data bags. When it is told by the memory manager to dump memory, it walks this LinkedList, cleaning any entries that have gone stale and dumping any that are still valid. The problem is that in a script that processes many rows, the LinkedList itself grows very large, and becomes the cause of needing to dump memory.


        Alan Gates created issue -
        Alan Gates made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        Alan Gates made changes -
        Attachment PIG-164.patch [ 12378322 ]
        Alan Gates made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Olga Natkovich made changes -
        Fix Version/s 0.1.0 [ 12312848 ]
        Alan Gates made changes -
        Status Resolved [ 5 ] Closed [ 6 ]


          • Assignee:
            Alan Gates
            Alan Gates
          • Votes:
            0 Vote for this issue
            0 Start watching this issue


            • Created: