Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Cross join is one of the very heavy operations. Furthermore, this operator is performed by a single worker in the current implementation. (Please see the implementation of HashPartitioner. If partitionKeyIds is empty, getPartition() always returns a single value.)
One possible alternative is executing cross join with broadcast join. That is, outer table (smaller one) is always broadcasted, and join is performed by the machine who stores a part of inner table.
To do so, a new session variable is required to set the broadcast threshold for cross join.