Hive
  1. Hive
  2. HIVE-7248

UNION ALL in hive returns incorrect results on Hbase backed table

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.12.0, 0.13.0, 0.13.1
    • Fix Version/s: None
    • Component/s: HBase Handler
    • Labels:
      None

      Description

      The issue can be recreated with following steps

      1) In hbase
      create 'TABLE_EMP','default'

      2) On hive
      sudo -u hive hive

      CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping" = "default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key", "hbase.scan.cache" = "500", "hbase.scan.cacheblocks" = "false" ) TBLPROPERTIES("hbase.table.name" = "TABLE_EMP",'serialization.null.format'='');

      3) On hbase insert the following data

      put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini'
      put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P'
      put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00'

      put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind'
      put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K'
      put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00'

      4) On hive execute the following query
      hive
      SELECT *
      FROM (
      SELECT CDS_PK
      FROM TABLE_EMP
      WHERE
      CDS_PK >= '0'
      AND CDS_PK <= '9'
      AND CDS_UPDATED_DATE IS NOT NULL
      UNION ALL SELECT CDS_PK
      FROM TABLE_EMP
      WHERE
      CDS_PK >= 'a'
      AND CDS_PK <= 'z'
      AND CDS_UPDATED_DATE IS NOT NULL
      )t ;

      5) Output of the query

      1
      1
      2
      2

      6) Output of just

      SELECT CDS_PK
      FROM TABLE_EMP
      WHERE
      CDS_PK >= '0'
      AND CDS_PK <= '9'
      AND CDS_UPDATED_DATE IS NOT NULL

      is

      1
      2

      7) Output of just

      SELECT CDS_PK
      FROM TABLE_EMP
      WHERE
      CDS_PK >= 'a'
      AND CDS_PK <= 'z'
      AND CDS_UPDATED_DATE IS NOT NULL

      Empty

      8) UNION is used to combine the result from multiple SELECT statements into a single result set. Hive currently only supports UNION ALL (bag union), in which duplicates are not eliminated

      Accordingly above query should return output
      1
      2

      instead it is giving wrong output
      1
      1
      2
      2

      1. HIVE-7248.3.patch.txt
        19 kB
        Navis
      2. HIVE-7248.2.patch.txt
        16 kB
        Navis
      3. HIVE-7248.1.patch.txt
        15 kB
        Navis

        Activity

          People

          • Assignee:
            Navis
            Reporter:
            Mala Chikka Kempanna
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development