Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Currently when using a COLLECT_SET/COLLECT_LIST that involves data from a single table, the aggregation is done after any JOIN operation that is present in the query. For example:
insert into table nested_customers_orders select c.*, collect_list(named_struct("oid", o.oid, "order_date": o.date...)) from customers c inner join orders o on (c.cid = o.oid) group by o.oid, o.date,...
If we can tell the optimizer to perform the COLLECT_LIST first (where possible) we can see some performance gains in this pattern of query.
Attachments
Issue Links
- relates to
-
HIVE-13076 Implement FK/PK "rely novalidate" constraints for better CBO
- Resolved