Pig
  1. Pig
  2. PIG-466

PERFORMANCE: dropping the columns as soon as possible

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.2.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently, each operator carries all the data until foreach is encountered. This can cause significant performance degradation.

        Activity

        Olga Natkovich created issue -
        Nigel Daley made changes -
        Field Original Value New Value
        Fix Version/s 1.0.0 [ 12313288 ]
        Nigel Daley made changes -
        Affects Version/s 0.2.0 [ 12313783 ]
        Affects Version/s 1.0.0 [ 12313288 ]
        Hide
        Olga Natkovich added a comment -

        This is part of new optimizer work

        Show
        Olga Natkovich added a comment - This is part of new optimizer work
        Olga Natkovich made changes -
        Assignee Daniel Dai [ daijy ]
        Fix Version/s 0.8.0 [ 12314562 ]
        Hide
        Scott Carey added a comment -

        This is both a performance and usability issue.

        If the optimizer could automatically push projections up to the earliest possible time, it would also unclutter large scripts that manually project 'early and often' for performance reasons.

        I have reason to believe that some of these extra lines of projection interferes with certain other performance optimizations as well (on 0.5, multi-query optimization sometimes fails due to extra projections in between, some forms of projection break combiner use as well).

        Show
        Scott Carey added a comment - This is both a performance and usability issue. If the optimizer could automatically push projections up to the earliest possible time, it would also unclutter large scripts that manually project 'early and often' for performance reasons. I have reason to believe that some of these extra lines of projection interferes with certain other performance optimizations as well (on 0.5, multi-query optimization sometimes fails due to extra projections in between, some forms of projection break combiner use as well).
        Hide
        Dmitriy V. Ryaboy added a comment -

        This was done as PIG-922

        Show
        Dmitriy V. Ryaboy added a comment - This was done as PIG-922
        Dmitriy V. Ryaboy made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.6.0 [ 12314214 ]
        Fix Version/s 0.8.0 [ 12314562 ]
        Resolution Duplicate [ 3 ]
        Hide
        Daniel Dai added a comment -

        PIG-922 partially solve this issue by pushing columns to the loader. However, we can go beyond that. For example:

        a = load '1.txt' as (a0, a1, a2, a3);
        b = filter a by a2==1;
        c = order b by a1;
        d = foreach c generate a0, a1;
        

        PIG-922 is able to figure out a3 is not needed in the script and don't load it. One step further, we can figure out a2 is no longer needed after b, so we can add a foreach and drop a2 after b. This is not covered by PIG-922 and is part of new optimizer work.

        Show
        Daniel Dai added a comment - PIG-922 partially solve this issue by pushing columns to the loader. However, we can go beyond that. For example: a = load '1.txt' as (a0, a1, a2, a3); b = filter a by a2==1; c = order b by a1; d = foreach c generate a0, a1; PIG-922 is able to figure out a3 is not needed in the script and don't load it. One step further, we can figure out a2 is no longer needed after b, so we can add a foreach and drop a2 after b. This is not covered by PIG-922 and is part of new optimizer work.
        Daniel Dai made changes -
        Resolution Duplicate [ 3 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Olga Natkovich made changes -
        Fix Version/s 0.6.0 [ 12314214 ]
        Olga Natkovich made changes -
        Fix Version/s 0.8.0 [ 12314562 ]
        Hide
        Olga Natkovich added a comment -

        This is already resolved as part of PIG-1178

        Show
        Olga Natkovich added a comment - This is already resolved as part of PIG-1178
        Olga Natkovich made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Olga Natkovich made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        586d 23h 3m 1 Dmitriy V. Ryaboy 09/May/10 23:06
        Resolved Resolved Reopened Reopened
        1d 19h 34m 1 Daniel Dai 11/May/10 18:40
        Reopened Reopened Resolved Resolved
        73d 2h 20m 1 Olga Natkovich 23/Jul/10 21:00
        Resolved Resolved Closed Closed
        147d 1h 42m 1 Olga Natkovich 17/Dec/10 22:43

          People

          • Assignee:
            Daniel Dai
            Reporter:
            Olga Natkovich
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development