Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11725

Query result incorrect when querying and filtering NULL values of sub-query

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • Impala 4.1.0
    • None
    • Distributed Exec, Frontend
    • None

    Description

      We found that Impala can't filter NULL values from sub-query. For example, prepare two test tables err_tbl1 and err_tbl2(id INT, dt STRING):

      +----+------------+
      | id | dt         |
      +----+------------+
      | 14 | 2022-11-13 |
      | 15 | 2022-11-13 |
      | 13 | 2022-11-13 |
      +----+------------+
      
      +----+------------+
      | id | dt         |
      +----+------------+
      | 14 | 2022-11-13 |
      | 16 | 2022-11-13 |
      | 13 | 2022-11-13 |
      +----+------------+
      

      And submit query below:

      SELECT *
      FROM (
          SELECT aid, bid, COUNT(*) AS c
          FROM (
              SELECT id AS aid
              FROM err_tbl1
              WHERE dt = '2022-11-13'
          ) a
              FULL JOIN (
                  SELECT id AS bid
                  FROM err_tbl2
                  WHERE dt = '2022-11-13'
              ) b
              ON a.aid = b.bid
          GROUP BY aid, bid
      ) t1
      WHERE aid = bid;

      Out result includes 4 rows:

      +------+------+---+
      | aid  | bid  | c |
      +------+------+---+
      | 13   | 13   | 1 |
      | 14   | 14   | 1 |
      | NULL | 15   | 1 |
      | 16   | NULL | 1 |
      +------+------+—+
      

      Obviously, condition of 'aid=bid' is invalid. 'NULL' value should be filtered out. But if we use condition of 'aid!=bid', there will be empty result set, which means '!=' can filter out 'NULL' value while '=' cannot. What's more, if we create a table as select * from sub-query and execute 'SELECT * FROM sub_table WHERE aid=bid'. The result is correct. If we surround 'aid=bid' with 'trim()' like 'trim(cast(aid as string))=trim(cast(bid as string))', result is also correct. In Spark, result of the same query doesn't contain 'NULL' values.

      Attachments

        Activity

          People

            Yuchen Fan Yuchen Fan
            Yuchen Fan Yuchen Fan
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: