Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22538

RS deduplication does not always enforce hive.optimize.reducededuplication.min.reducer

    XMLWordPrintableJSON

    Details

      Description

      For transactional tables, that property might be overriden to 1, which can lead to merging final aggregation into a single stage (hence leading to performance degradation). For instance, when autogather column stats is enabled, this can happen for the following query:

      set hive.support.concurrency=true;
      set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
      
      EXPLAIN
      CREATE TABLE x STORED AS ORC TBLPROPERTIES('transactional'='true') AS
      SELECT * FROM SRC x CLUSTER BY x.key;
      

        Attachments

        1. HIVE-22538.2.patch
          12 kB
          Krisztian Kasa
        2. HIVE-22538.3.patch
          19 kB
          Krisztian Kasa
        3. HIVE-22538.4.patch
          187 kB
          Krisztian Kasa
        4. HIVE-22538.5.patch
          205 kB
          Krisztian Kasa
        5. HIVE-22538.6.patch
          212 kB
          Krisztian Kasa
        6. HIVE-22538.6.patch
          212 kB
          Krisztian Kasa
        7. HIVE-22538.7.patch
          73 kB
          Krisztian Kasa
        8. HIVE-22538.8.patch
          73 kB
          Krisztian Kasa
        9. HIVE-22538.8.patch
          73 kB
          Krisztian Kasa
        10. HIVE-22538.8.patch
          73 kB
          Krisztian Kasa
        11. HIVE-22538.patch
          16 kB
          Jesus Camacho Rodriguez

          Issue Links

            Activity

              People

              • Assignee:
                kkasa Krisztian Kasa
                Reporter:
                jcamachorodriguez Jesus Camacho Rodriguez
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m