Details
Description
How to reproduce the problem
linux shell command to prepare data:
for i in $(seq 200000);do echo "$(($i+100000)),name$i,$(($i*10))";done > data.text
sql to reproduce the problem:
- create table data_table(id int, str string, num int) row format delimited fields terminated by ',';
- load data local inpath '/path/to/data.text' into table data_table;
- CACHE TABLE test_cache_table AS
SELECT str
FROM
(SELECT id,str FROM data_table
)group by str;
Finally you will see a stage with 200 tasks and not coalesce shuffle partitions, the problem will waste resource when data size is small.