Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18490

Query with EXISTS and NOT EXISTS with non-equi predicate can produce wrong result

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 3.0.0
    • CBO
    • None

    Description

      Queries such as following can produce wrong result

      select  
         count(ws_order_number)
      from
         web_sales ws1
      where
      and exists (select *
                  from web_sales ws2
                  where ws1.ws_order_number = ws2.ws_order_number
                    and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
      and not exists(select *
                     from web_returns wr1
                     where ws1.ws_order_number = wr1.wr_order_number);
      

      This query is simplified version of tpcds query 94. Such queries are rewritten into LEFT SEMI JOIN and LEFT OUTER JOIN with residual predicate/filter (non-equi join key). Problem is that these joins are being merged, we shouldn't be merging since semi join has non-equi join filter.

      Basically the underlying issue is that if a query has multiple join with LEFT SEMI JOIN with non-equi join key it is being merged with other joins. Merge logic should check such cases and avoid merging.

      Attachments

        1. HIVE-18490.1.patch
          13 kB
          Vineet Garg
        2. HIVE-18490.2.patch
          83 kB
          Vineet Garg

        Activity

          People

            vgarg Vineet Garg
            vgarg Vineet Garg
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: