Instead of producing duplicate data for the synthetic cross-product, to fit into partitions, the amount of net IO can be vastly reduced by a special purpose cross-product data movement edge.
The Shuffle edge routes each partition's output to a single reducer, while the cross-product edge routes it into a matrix of reducers without actually duplicating the disk data.
A partitioning scheme with 3 partitions on the lhs and rhs of a join operation can be routed into 9 reducers by performing a cross-product similar to
(1,2,3) x (a,b,c) = [(1,a), (1,b), (1,c), (2,a), (2,b) ...]
This turns a single task cross-product model into a distributed cross product.