Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
CREATE TABLE person (id INTEGER, country STRING); EXPLAIN CBO SELECT country, count(1) FROM person GROUP BY country LIMIT 5;
The EXPLAIN query produces a slightly different plan (ordering of nulls) from one execution to another.
CBO PLAN: HiveSortLimit(sort0=[$1], dir0=[ASC-nulls-first], fetch=[5]) HiveProject(country=[$0], $f1=[$1]) HiveAggregate(group=[{1}], agg#0=[count()]) HiveTableScan(table=[[default, person]], table:alias=[person])
CBO PLAN: HiveSortLimit(sort0=[$1], dir0=[ASC], fetch=[5]) HiveProject(country=[$0], $f1=[$1]) HiveAggregate(group=[{1}], agg#0=[count()]) HiveTableScan(table=[[default, person]], table:alias=[person])
This is unlikely to cause wrong results cause most aggregate functions (not all) do not return nulls thus null ordering doesn't matter much but it can lead to other problems such as:
- intermittent CI failures
- query/plan caching
I bumped into this problem after investigating test failures in CI. The following query in offset_limit_ppd_optimizer.q returns different plan when it runs individually and when it runs along with some other qtest files.
explain select * from (select key, count(1) from src group by key order by key limit 10,20) subq join (select key, count(1) from src group by key limit 20,20) subq2 on subq.key=subq2.key limit 3,5;
Attachments
Issue Links
- relates to
-
HIVE-24579 Incorrect Result For Groupby With Limit
- Closed
- links to