Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22636

Data loss on skewjoin for ACID tables.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Cannot Reproduce
    • 4.0.0
    • Not Applicable
    • None

    Description

      I am trying to do a skewjoin and writing the result into a FullAcid table. The results are incorrect. The issue is similar to seen for MM tables in HIVE-16051 where the fix was to skip having a skewjoin for MM table. 

      Steps to reproduce:

      Used a qtest similar to HIVE-16051:

      --! qt:dataset:src1
      --! qt:dataset:src
      
      -- MASK_LINEAGE
      set hive.mapred.mode=nonstrict;
      set hive.exec.dynamic.partition.mode=nonstrict;
      set hive.support.concurrency=true;
      set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
      set hive.optimize.skewjoin=true;
      set hive.skewjoin.key=2;
      set hive.optimize.metadataonly=false;
      
      CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties ("transactional"="true");
      FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE skewjoin_acid SELECT src1.key, src2.value;
      select count(distinct key) from skewjoin_acid;
      drop table skewjoin_acid;
      

      The expected result for the count was 309 but got 173. 

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            aditya-shah Aditya Shah
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: