Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6395

multi-table insert from select transform fails if optimize.ppd enabled

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.13.0
    • 0.13.0
    • Query Processor
    • None

    Description

      set hive.optimize.ppd=true;
      add file ./test.py;
      
      from (select transform(test.*) using 'python ./test.py'
      as id,name,state from test) t0
      insert overwrite table test2 select * where state=1
      insert overwrite table test3 select * where state=2;
      

      In the above example, the select transform returns an extra column, and that column is used in where clause of the multi-insert selects. However, if optimize is on, the query plan is wrong:

      filter (state=1 and state=2) //impossible
      --> select, insert into test1
      --> select, insert into test2

      The correct query plan for hive.optimize.ppd=false is:
      filter (state=1)
      --> select, insert into test1
      filter (state=2)
      --> select, insert into test2

      For reference

      create table test (id int, name string)
      create table test2(id int, name string, state int)
      create table test3(id int, name string, state int)
      

      Attachments

        1. HIVE-6395.patch
          82 kB
          Szehon Ho
        2. test.py
          0.5 kB
          Szehon Ho

        Issue Links

          Activity

            People

              szehon Szehon Ho
              szehon Szehon Ho
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: