Hive
  1. Hive
  2. HIVE-6395

multi-table insert from select transform fails if optimize.ppd enabled

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.13.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      set hive.optimize.ppd=true;
      add file ./test.py;
      
      from (select transform(test.*) using 'python ./test.py'
      as id,name,state from test) t0
      insert overwrite table test2 select * where state=1
      insert overwrite table test3 select * where state=2;
      

      In the above example, the select transform returns an extra column, and that column is used in where clause of the multi-insert selects. However, if optimize is on, the query plan is wrong:

      filter (state=1 and state=2) //impossible
      --> select, insert into test1
      --> select, insert into test2

      The correct query plan for hive.optimize.ppd=false is:
      filter (state=1)
      --> select, insert into test1
      filter (state=2)
      --> select, insert into test2

      For reference

      create table test (id int, name string)
      create table test2(id int, name string, state int)
      create table test3(id int, name string, state int)
      
      1. test.py
        0.5 kB
        Szehon Ho
      2. HIVE-6395.patch
        82 kB
        Szehon Ho

        Issue Links

          Activity

          Hide
          Szehon Ho added a comment -

          Attaching review board for review.

          Navis can you also review too , as you had done some fixes for ppd optimizer.

          Show
          Szehon Ho added a comment - Attaching review board for review. Navis can you also review too , as you had done some fixes for ppd optimizer.
          Hide
          Szehon Ho added a comment -

          Actually, I just saw your fix for HIVE-4293, is it a more complete fix for the same situation ?

          Show
          Szehon Ho added a comment - Actually, I just saw your fix for HIVE-4293 , is it a more complete fix for the same situation ?
          Hide
          Xuefu Zhang added a comment -

          Szehon Ho The patch looks good to me. However, just one thing to consider: your patch would make PPD off in the given case, which has performance implications. However, if we push down ( state == 1 || state == 2 ) in the test case while keeping the filter at the filter, wouldn't that work? However, I'm not sure if this is possible or have covets. What you think?

          Show
          Xuefu Zhang added a comment - Szehon Ho The patch looks good to me. However, just one thing to consider: your patch would make PPD off in the given case, which has performance implications. However, if we push down ( state == 1 || state == 2 ) in the test case while keeping the filter at the filter, wouldn't that work? However, I'm not sure if this is possible or have covets. What you think?
          Hide
          Ashutosh Chauhan added a comment -

          I think HIVE-4293 indeed fixes this and is more complete. Harish Butani / Navis Can you verify ?

          Show
          Ashutosh Chauhan added a comment - I think HIVE-4293 indeed fixes this and is more complete. Harish Butani / Navis Can you verify ?
          Hide
          Szehon Ho added a comment -

          Thanks Xuefu, its not possible unless I'm misunderstanding the question, the entire FilterOp is pushed down, the predicate is just a part of it. Navis's patch also seems to remove the candidate if its multi-insert case, as its not implemented yet in ppd. (see OpProcFactory.getChildWalkerInfo).

          Although I am curious why this is duplicated in HIVE-4293 which is about subquery + udtf, I wonder if its a part of the main fix, or just an additional fix that got added?

          Show
          Szehon Ho added a comment - Thanks Xuefu, its not possible unless I'm misunderstanding the question, the entire FilterOp is pushed down, the predicate is just a part of it. Navis's patch also seems to remove the candidate if its multi-insert case, as its not implemented yet in ppd. (see OpProcFactory.getChildWalkerInfo). Although I am curious why this is duplicated in HIVE-4293 which is about subquery + udtf, I wonder if its a part of the main fix, or just an additional fix that got added?
          Hide
          Xuefu Zhang added a comment -

          Although I am curious why this is duplicated in HIVE-4293 which is about subquery + udtf, I wonder if its a part of the main fix, or just an additional fix that got added?

          I didn't read the patch in HIVE-4293, but from Hive's perspective, UDTF is very similar to TRANSFORM() except that the former is is done via UDTF's java code, and the later in external script via streaming. For this reason, the problem here might be a sub-problem of HIVE-4293. This is just my guess.

          Show
          Xuefu Zhang added a comment - Although I am curious why this is duplicated in HIVE-4293 which is about subquery + udtf, I wonder if its a part of the main fix, or just an additional fix that got added? I didn't read the patch in HIVE-4293 , but from Hive's perspective, UDTF is very similar to TRANSFORM() except that the former is is done via UDTF's java code, and the later in external script via streaming. For this reason, the problem here might be a sub-problem of HIVE-4293 . This is just my guess.
          Hide
          Ashutosh Chauhan added a comment -

          Tested the test case in patch on trunk which now has HIVE-4293. It now passes. Feel free to reopen if you can still repro in some other form.

          Show
          Ashutosh Chauhan added a comment - Tested the test case in patch on trunk which now has HIVE-4293 . It now passes. Feel free to reopen if you can still repro in some other form.

            People

            • Assignee:
              Szehon Ho
              Reporter:
              Szehon Ho
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development