Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14919 Improve the performance of Hive on Spark 2.0.0
  3. HIVE-16600

Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel order by in multi_insert cases

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • None
    • None

    Description

      multi_insert_gby.case.q

      set hive.exec.reducers.bytes.per.reducer=256;
      set hive.optimize.sampling.orderby=true;
      drop table if exists e1;
      drop table if exists e2;
      create table e1 (key string, value string);
      create table e2 (key string);
      FROM (select key, cast(key as double) as keyD, value from src order by key) a
      INSERT OVERWRITE TABLE e1
          SELECT key, value
      INSERT OVERWRITE TABLE e2
          SELECT key;
      
      select * from e1;
      select * from e2;
      

      the parallelism of Sort is 1 even we enable parallel order by("hive.optimize.sampling.orderby" is set as "true"). This is not reasonable because the parallelism should be calcuated by Utilities.estimateReducers
      this is because SetSparkReducerParallelism#needSetParallelism returns false when children size of RS is greater than 1.
      in this case, the children size of RS[2] is two.

      the logical plan of the case

         TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5]
                                  -SEL[6]-FS[7]
      
      

      Attachments

        1. HIVE-16600.13.patch
          37 kB
          liyunzhang
        2. HIVE-16600.12.patch
          37 kB
          liyunzhang
        3. HIVE-16600.11.patch
          14 kB
          liyunzhang
        4. HIVE-16600.10.patch
          56 kB
          liyunzhang
        5. TestSetSparkReduceParallelism_MultiInsertCase.java
          14 kB
          liyunzhang
        6. Node.java
          1 kB
          liyunzhang
        7. HIVE-16600.9.patch
          54 kB
          liyunzhang
        8. HIVE-16600.8.patch
          61 kB
          liyunzhang
        9. HIVE-16600.7.patch
          61 kB
          liyunzhang
        10. HIVE-16600.6.patch
          43 kB
          liyunzhang
        11. HIVE-16600.5.patch
          20 kB
          liyunzhang
        12. HIVE-16600.4.patch
          30 kB
          liyunzhang
        13. mr.explain
          14 kB
          liyunzhang
        14. HIVE-16600.3.patch
          30 kB
          liyunzhang
        15. mr.explain.log.HIVE-16600
          61 kB
          liyunzhang
        16. HIVE-16600.2.patch
          32 kB
          liyunzhang
        17. HIVE-16600.1.patch
          2 kB
          liyunzhang

        Issue Links

          Activity

            People

              kellyzly liyunzhang
              kellyzly liyunzhang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: