Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24712

hive.map.aggr=false and hive.optimize.reducededuplication=false provide incorrect result on order by with limit

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 3.1.0
    • None
    • CBO
    • None

    Description

       When Both param set to false , seems the result is not correct, a query that should return 200 rows but now only returns 35 rows. This is tested on HDP 3.1.5

      set hive.map.aggr=false;
      set hive.optimize.reducededuplication=false;

      select cs_sold_date_sk,count(distinct cs_order_number) from tpcds_orc.catalog_sales_orc group by cs_sold_date_sk order by cs_sold_date_sk limit 200;

      ----------------------------------------------------------------------------------------------
      VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
      ----------------------------------------------------------------------------------------------
      Map 1 .......... llap SUCCEEDED 33 33 0 0 0 0
      Reducer 2 ...... llap SUCCEEDED 4 4 0 0 0 0
      Reducer 3 ...... llap SUCCEEDED 4 4 0 0 0 0
      Reducer 4 ...... llap SUCCEEDED 1 1 0 0 0 0
      ----------------------------------------------------------------------------------------------
      VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 38.23 s
      ----------------------------------------------------------------------------------------------
      FO :
      INFO : Task Execution Summary
      INFO : ----------------------------------------------------------------------------------------------
      INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
      INFO : ----------------------------------------------------------------------------------------------
      INFO : Map 1 38097.00 0 0 143,997,065 57,447
      INFO : Reducer 2 9003.00 0 0 57,447 13,108
      INFO : Reducer 3 0.00 0 0 13,108 35
      INFO : Reducer 4 0.00 0 0 35 0
      INFO : ----------------------------------------------------------------------------------------------
      INFO :
      INFO : LLAP IO Summary

       

       

      set hive.map.aggr=true;
      set hive.optimize.reducededuplication=false;

      select cs_sold_date_sk,count(distinct cs_order_number) from tpcds_orc.catalog_sales_orc group by cs_sold_date_sk order by cs_sold_date_sk limit 200;
      ----------------------------------------------------------------------------------------------
      VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
      ----------------------------------------------------------------------------------------------
      Map 1 .......... llap SUCCEEDED 33 33 0 0 0 0
      Reducer 2 ...... llap SUCCEEDED 4 4 0 0 0 0
      Reducer 3 ...... llap SUCCEEDED 2 2 0 0 0 0
      Reducer 4 ...... llap SUCCEEDED 1 1 0 0 0 0
      ----------------------------------------------------------------------------------------------
      VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 36.24 s
      ----------------------------------------------------------------------------------------------

      INFO : ----------------------------------------------------------------------------------------------
      INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
      INFO : ----------------------------------------------------------------------------------------------
      INFO : Map 1 25595.00 0 0 143,997,065 16,703,757
      INFO : Reducer 2 18556.00 0 0 16,703,757 800
      INFO : Reducer 3 8018.00 0 0 800 200
      INFO : Reducer 4 0.00 0 0 200 0
      INFO : ----------------------------------------------------------------------------------------------
      INFO :

      Attachments

        Activity

          People

            Unassigned Unassigned
            liuyan liuyan
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: