Pig
  1. Pig
  2. PIG-3173

Partition filter push down does not happen partition keys condition include a AND and OR construct

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.1
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None

      Description

      A = load 'db.table' using org.apache.hcatalog.pig.HCatLoader();
      B = filter A by (region=='usa' AND dt=='201302051800') OR (region=='uk' AND dt=='201302051800');
      C = foreach B generate name, age;
      DUMP C;

      gives the below warning and scans the whole table.

      2013-02-06 22:22:16,233 [main] WARN org.apache.pig.newplan.PColFilterExtractor - No partition filter push down: You have an partition column (region ) in a construction like: (pcond and ...) or (pcond and ...) where pcond is a condition on a partition column.
      2013-02-06 22:22:16,233 [main] WARN org.apache.pig.newplan.PColFilterExtractor - No partition filter push down: You have an partition column (datestamp ) in a construction like: (pcond and ...) or (pcond and ...) where pcond is a condition on a partition column.

      1. PIG-3173-1.patch
        13 kB
        Rohini Palaniswamy
      2. PIG-3173-2.patch
        9 kB
        Rohini Palaniswamy

        Issue Links

          Activity

          Show
          Rohini Palaniswamy added a comment - https://reviews.apache.org/r/10035/
          Hide
          Alan Gates added a comment -

          Canceling patch until feedback from Dmitriy is addressed.

          Show
          Alan Gates added a comment - Canceling patch until feedback from Dmitriy is addressed.
          Hide
          Cheolsoo Park added a comment -

          If I understand comments on RB, there is no real issue with the patch other than that we can do better on the '(A and B) or (C and D)' case.

          Currently, Pig rejects all of the following expressions even if A, B, C, and D are all partition conditions:

          • (A and B) or (C and D)
          • (A and B) or C
          • A or (C and D)

          But this patch at least lets Pig push down expressions when A, B, C, and D are ALL partition conditions. IMO, this alone is a big win. Can we get this patch in and do further optimization on the '(A and B) or (C and D)' case in a separate jira?

          Thanks!

          Show
          Cheolsoo Park added a comment - If I understand comments on RB, there is no real issue with the patch other than that we can do better on the '(A and B) or (C and D)' case. Currently, Pig rejects all of the following expressions even if A, B, C, and D are all partition conditions: (A and B) or (C and D) (A and B) or C A or (C and D) But this patch at least lets Pig push down expressions when A, B, C, and D are ALL partition conditions. IMO, this alone is a big win. Can we get this patch in and do further optimization on the '(A and B) or (C and D)' case in a separate jira? Thanks!
          Hide
          Rohini Palaniswamy added a comment -

          > But this patch at least lets Pig push down expressions when A, B, C, and D are ALL partition conditions. IMO, this alone is a big win. Can we get this patch in and do further optimization on the '(A and B) or (C and D)' case in a separate jira?
          Sure. I had started on the optimization patch but I did not complete it before leaving for vacation. Wanted to be careful as there was lot of change and I had to almost evaluate the whole tree and ensure it works for all combinations as we are extracting partial conditions. I will create a separate jira for that and put the patch later. I will update this patch (first one had a bug) with just pushing down all partition conditions.

          Show
          Rohini Palaniswamy added a comment - > But this patch at least lets Pig push down expressions when A, B, C, and D are ALL partition conditions. IMO, this alone is a big win. Can we get this patch in and do further optimization on the '(A and B) or (C and D)' case in a separate jira? Sure. I had started on the optimization patch but I did not complete it before leaving for vacation. Wanted to be careful as there was lot of change and I had to almost evaluate the whole tree and ensure it works for all combinations as we are extracting partial conditions. I will create a separate jira for that and put the patch later. I will update this patch (first one had a bug) with just pushing down all partition conditions.
          Hide
          Cheolsoo Park added a comment -

          +1 to PIG-3173-2.patch.

          Show
          Cheolsoo Park added a comment - +1 to PIG-3173 -2.patch.
          Hide
          Rohini Palaniswamy added a comment -

          Checked into trunk (0.12). Thanks Dmitriy and Cheolsoo

          Show
          Rohini Palaniswamy added a comment - Checked into trunk (0.12). Thanks Dmitriy and Cheolsoo

            People

            • Assignee:
              Rohini Palaniswamy
              Reporter:
              Rohini Palaniswamy
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development