Description
Currently Join ordering completely bails out in absence of statistics and this could lead to bad joins such as cross joins.
e.g. following select query will produce cross join.
create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, S_NATIONKEY INT, S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING) CREATE TABLE lineitem (L_ORDERKEY INT, L_PARTKEY INT, L_SUPPKEY INT, L_LINENUMBER INT, L_QUANTITY DOUBLE, L_EXTENDEDPRICE DOUBLE, L_DISCOUNT DOUBLE, L_TAX DOUBLE, L_RETURNFLAG STRING, L_LINESTATUS STRING, l_shipdate STRING, L_COMMITDATE STRING, L_RECEIPTDATE STRING, L_SHIPINSTRUCT STRING, L_SHIPMODE STRING, L_COMMENT STRING) partitioned by (dl int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'; CREATE TABLE part( p_partkey INT, p_name STRING, p_mfgr STRING, p_brand STRING, p_type STRING, p_size INT, p_container STRING, p_retailprice DOUBLE, p_comment STRING ); explain select count(1) from part,supplier,lineitem where p_partkey = l_partkey and s_suppkey = l_suppkey;
Estimating stats will prevent join ordering algorithm to bail out and come up with join at least better than cross join
Attachments
Attachments
Issue Links
- fixes
-
HIVE-17406 UDAF throws IllegalArgumentException for a complex input when column stats is not provided
- Resolved