Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9613

Left join query plan outputs wrong column when using subquery

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.14.0, 1.0.0
    • Fix Version/s: 1.2.0, 1.1.1
    • Component/s: Parser, Query Planning
    • Labels:
      None
    • Environment:

      apache hadoop 2.5.1

      Description

      I have a query that outputs a column with wrong contents when using subquery,and the contents of that column is equal to another column,not its own.

      I have three tables,as follows:

      table 1: hivetemp.category_city_rank:

      category city rank
      jinrongfuwu shanghai 1
      ktvjiuba shanghai 2

      table 2:hivetemp.category_match:

      src_category_en src_category_cn dst_category_en dst_category_cn
      danbaobaoxiantouzi 投资担保 担保/贷款 jinrongfuwu
      zpwentiyingshi 娱乐/休闲 KTV/酒吧 ktvjiuba

      table 3:hivetemp.city_match:

      src_city_name_en dst_city_name_en city_name_cn
      sh shanghai 上海

      And the query is :

      select
          a.category,
          a.city,
          a.rank,
          b.src_category_en,
          c.src_city_name_en
      from
          hivetemp.category_city_rank a
      left outer join
      (select
          src_category_en,
          dst_category_en
      from
          hivetemp.category_match) b
      on  a.category = b.dst_category_en
      left outer join
      (select
          src_city_name_en,
          dst_city_name_en
      from
          hivetemp.city_match) c
      on  a.city = c.dst_city_name_en
      

      which shoud output the results as follows,and i test it in hive 0.13:

      category city rank src_category_en src_city_name_en
      jinrongfuwu shanghai 1 danbaobaoxiantouzi sh
      ktvjiuba shanghai 2 zpwentiyingshi sh

      but int hive0.14,the results in the column src_category_en is wrong,and is just the city contents:

      category city rank src_category_en src_city_name_en
      jinrongfuwu shanghai 1 shanghai sh
      ktvjiuba shanghai 2 shanghai sh

      Using explain to examine the execution plan,i can see the first subquery just outputs one column of dst_category_en,and src_category_en is just missing.

      b:category_match
      TableScan
      alias: category_match
      Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE
      Select Operator
      expressions: dst_category_en (type: string)
      outputColumnNames: _col1
      Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE

        Attachments

        1. test.sql
          1 kB
          Li Xin
        2. HIVE-9613.1.patch
          16 kB
          Gunther Hagleitner

          Issue Links

            Activity

              People

              • Assignee:
                hagleitn Gunther Hagleitner
                Reporter:
                spyfree Li Xin
              • Votes:
                1 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: