Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6395

multi-table insert from select transform fails if optimize.ppd enabled

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.13.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      set hive.optimize.ppd=true;
      add file ./test.py;
      
      from (select transform(test.*) using 'python ./test.py'
      as id,name,state from test) t0
      insert overwrite table test2 select * where state=1
      insert overwrite table test3 select * where state=2;
      

      In the above example, the select transform returns an extra column, and that column is used in where clause of the multi-insert selects. However, if optimize is on, the query plan is wrong:

      filter (state=1 and state=2) //impossible
      --> select, insert into test1
      --> select, insert into test2

      The correct query plan for hive.optimize.ppd=false is:
      filter (state=1)
      --> select, insert into test1
      filter (state=2)
      --> select, insert into test2

      For reference

      create table test (id int, name string)
      create table test2(id int, name string, state int)
      create table test3(id int, name string, state int)
      

        Attachments

        1. HIVE-6395.patch
          82 kB
          Szehon Ho
        2. test.py
          0.5 kB
          Szehon Ho

          Issue Links

            Activity

              People

              • Assignee:
                szehon Szehon Ho
                Reporter:
                szehon Szehon Ho
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: