Pig
  1. Pig
  2. PIG-164

In scripts that create large groups pig runs out of memory

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.0.0
    • Fix Version/s: 0.1.0
    • Component/s: impl
    • Labels:
      None

      Description

      Scripts that need to group large amounts of data, such as a group all with 20m records, often die with errors indicating that no more memory can be allocated. PIG-40 addressed this somewhat, but not completely. In fact, it appears that in some situations it made it worse. If a script creates many data bags it can now run out of memory tracking all those data bags that it may need to spill even if none of those bags gets very large.

      The issue is that the fix to PIG-40 introduced a memory manager that has a LinkedList of WeakReferences that it uses to track these data bags. When it is told by the memory manager to dump memory, it walks this LinkedList, cleaning any entries that have gone stale and dumping any that are still valid. The problem is that in a script that processes many rows, the LinkedList itself grows very large, and becomes the cause of needing to dump memory.

        Activity

        Hide
        Alan Gates added a comment -

        The attached patch addresses the issue by changing the memory manager to do some cleaning of the list whenever a databag is registered.

        I tried two previous approaches that did not work.

        First, I had the memory manager spin a separate thread that woke up every five seconds and cleaned the list. For reasons that are entirely unclear to me this solution ran out of memory faster than before.

        Second, I had the register call clean the entire list. This proved to be far too expensive, and slowed down performance by about 10x.

        So, in this final register begins searching at the head of list, cleaning any weak references it can. As soon as it encounters an entry in the list that is valid, it quits looking. This avoids long searches through the list when most of the entries are valid. It rests on the assumption that data bags generally live about the same amount of time, thus there won't be a long lived databag at the head of the list blocking the cleaning of many stale references later in the list.

        Show
        Alan Gates added a comment - The attached patch addresses the issue by changing the memory manager to do some cleaning of the list whenever a databag is registered. I tried two previous approaches that did not work. First, I had the memory manager spin a separate thread that woke up every five seconds and cleaned the list. For reasons that are entirely unclear to me this solution ran out of memory faster than before. Second, I had the register call clean the entire list. This proved to be far too expensive, and slowed down performance by about 10x. So, in this final register begins searching at the head of list, cleaning any weak references it can. As soon as it encounters an entry in the list that is valid, it quits looking. This avoids long searches through the list when most of the entries are valid. It rests on the assumption that data bags generally live about the same amount of time, thus there won't be a long lived databag at the head of the list blocking the cleaning of many stale references later in the list.
        Hide
        Olga Natkovich added a comment -

        Looks good. +1

        Show
        Olga Natkovich added a comment - Looks good. +1
        Hide
        Benjamin Reed added a comment -

        +1 excellent

        Show
        Benjamin Reed added a comment - +1 excellent
        Hide
        Alan Gates added a comment -

        Fix checked in.

        Show
        Alan Gates added a comment - Fix checked in.

          People

          • Assignee:
            Alan Gates
            Reporter:
            Alan Gates
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development