Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7248

UNION ALL in hive returns incorrect results on Hbase backed table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 0.12.0, 0.13.0, 0.13.1
    • None
    • HBase Handler
    • None

    Description

      The issue can be recreated with following steps

      1) In hbase
      create 'TABLE_EMP','default'

      2) On hive
      sudo -u hive hive

      CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping" = "default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key", "hbase.scan.cache" = "500", "hbase.scan.cacheblocks" = "false" ) TBLPROPERTIES("hbase.table.name" = "TABLE_EMP",'serialization.null.format'='');

      3) On hbase insert the following data

      put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini'
      put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P'
      put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00'

      put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind'
      put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K'
      put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00'

      4) On hive execute the following query
      hive
      SELECT *
      FROM (
      SELECT CDS_PK
      FROM TABLE_EMP
      WHERE
      CDS_PK >= '0'
      AND CDS_PK <= '9'
      AND CDS_UPDATED_DATE IS NOT NULL
      UNION ALL SELECT CDS_PK
      FROM TABLE_EMP
      WHERE
      CDS_PK >= 'a'
      AND CDS_PK <= 'z'
      AND CDS_UPDATED_DATE IS NOT NULL
      )t ;

      5) Output of the query

      1
      1
      2
      2

      6) Output of just

      SELECT CDS_PK
      FROM TABLE_EMP
      WHERE
      CDS_PK >= '0'
      AND CDS_PK <= '9'
      AND CDS_UPDATED_DATE IS NOT NULL

      is

      1
      2

      7) Output of just

      SELECT CDS_PK
      FROM TABLE_EMP
      WHERE
      CDS_PK >= 'a'
      AND CDS_PK <= 'z'
      AND CDS_UPDATED_DATE IS NOT NULL

      Empty

      8) UNION is used to combine the result from multiple SELECT statements into a single result set. Hive currently only supports UNION ALL (bag union), in which duplicates are not eliminated

      Accordingly above query should return output
      1
      2

      instead it is giving wrong output
      1
      1
      2
      2

      Attachments

        1. HIVE-7248.1.patch.txt
          15 kB
          Navis Ryu
        2. HIVE-7248.2.patch.txt
          16 kB
          Navis Ryu
        3. HIVE-7248.3.patch.txt
          19 kB
          Navis Ryu

        Activity

          People

            navis Navis Ryu
            mkempanna Mala Chikka Kempanna
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: