Pig
  1. Pig
  2. PIG-2747

Support more predicate pushdown to a data source by pulling up multiple predicates from branches using the same data source

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      consider the following example:

      T = load ... ;
      T1 = filter T by col == 'hello';
      T2 = filter T by col =='world';

      currently Pig optimizer does not combine the two predicates and cannot push down the predicates to the data sources (via LoadMetadata). Thus the data source cannot do any filtering. A full table/file scan is required.

      A current more efficient workaround (by hand) is to rewrite the above script to the following equivalent one:

      T = load ...;
      T = filter T by col == 'hello' or col == 'world' ;
      T1 = filter T by col == 'hello';
      T2 = filter T by col == 'world';

      the above script enables Pig to push down the predicate (col == 'hello' or col == 'world') to the data source to use available partitions/indexes for potentially much more efficient processing.

      This JIRA is created to request PIG optimizer to perform the above type of optimization automatically.

        Activity

        Hide
        Daniel Dai added a comment -

        My understanding is there is a union after T1, T2, right?

        Yes we only merge the consecutive filter into "and" condition. We don't merge "or" condition. So you want

        filter cond1, filter cond2 -> union ==> filter cond1 or cond2

        Show
        Daniel Dai added a comment - My understanding is there is a union after T1, T2, right? Yes we only merge the consecutive filter into "and" condition. We don't merge "or" condition. So you want filter cond1, filter cond2 -> union ==> filter cond1 or cond2
        Hide
        Yu Xu added a comment -

        yes. that's use case. Thanks.

        Show
        Yu Xu added a comment - yes. that's use case. Thanks.
        Hide
        Aniket Mokashi added a comment -

        Duplicate of PIG-2668

        Show
        Aniket Mokashi added a comment - Duplicate of PIG-2668

          People

          • Assignee:
            Unassigned
            Reporter:
            Yu Xu
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development