Pig
  1. Pig
  2. PIG-844

PERFORMANCE: streaming data to the UDFs in foreach

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently, Pig places the data passed to UDFs into a bag. This can cause the process to use more memory than actually needed as in many cases it would be better to push the data one tuple at a time to the UDFs.

      For the case where combiner is invoked, this might not be that important; however, for non-algebraic UDFs as well as other cases where combiner can't be used, this can provide significant memory improvement.

      Another possible use case is where the data is already grouped going into pig and we don't need to group it again.

      How this will effect UDF interface needs to be further discussed.

        Activity

        Olga Natkovich created issue -
        Hide
        Olga Natkovich added a comment -

        accumulate interface took care of this.

        Show
        Olga Natkovich added a comment - accumulate interface took care of this.
        Olga Natkovich made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Alan Gates made changes -
        Fix Version/s 0.7.0 [ 12314397 ]
        Daniel Dai made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        164d 17h 35m 1 Olga Natkovich 23/Nov/09 17:26
        Resolved Resolved Closed Closed
        171d 13h 18m 1 Daniel Dai 14/May/10 07:45

          People

          • Assignee:
            Unassigned
            Reporter:
            Olga Natkovich
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development