Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
multi_insert_gby.case.q
set hive.exec.reducers.bytes.per.reducer=256; set hive.optimize.sampling.orderby=true; drop table if exists e1; drop table if exists e2; create table e1 (key string, value string); create table e2 (key string); FROM (select key, cast(key as double) as keyD, value from src order by key) a INSERT OVERWRITE TABLE e1 SELECT key, value INSERT OVERWRITE TABLE e2 SELECT key; select * from e1; select * from e2;
the parallelism of Sort is 1 even we enable parallel order by("hive.optimize.sampling.orderby" is set as "true"). This is not reasonable because the parallelism should be calcuated by Utilities.estimateReducers
this is because SetSparkReducerParallelism#needSetParallelism returns false when children size of RS is greater than 1.
in this case, the children size of RS[2] is two.
the logical plan of the case
TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5] -SEL[6]-FS[7]
Attachments
Attachments
Issue Links
- links to