Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11192

Wrong results for query with WHERE ... NOT IN when table has null values

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.1.0, 1.2.1
    • None
    • None
    • None
    • Hive on MR

    Description

      I tested this on cdh5.4.2 cluster and locally on the release-1.2.1 branch

      ```sql
      DROP TABLE IF EXISTS test1 ;
      DROP TABLE IF EXISTS test2 ;

      CREATE TABLE test1 (col1 STRING) ;
      INSERT INTO TABLE test1 VALUES ("1"), ("2"), ("3"), ("4") ;

      CREATE TABLE test2 (col1 STRING) ;
      INSERT INTO TABLE test2 VALUES ("1"), ("4"), (NULL) ;

      SELECT
      COUNT(1)
      FROM test1 T1
      WHERE T1.col1 NOT IN (SELECT col1 FROM test2)
      ;

      SELECT
      COUNT(1)
      FROM test1 T1
      WHERE T1.col1 NOT IN (SELECT col1 FROM test2 WHERE col1 IS NOT NULL)
      ;
      ```

      The first query returns 0 and the second returns 2.
      Obviously, the expected answer is always 2.

      Attachments

        Activity

          People

            Unassigned Unassigned
            fpin Furcy Pin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: