The global engine generates a logical plan, and then marks some parts of the plan as broadcast plan which means that they and their input will be broadcasted to all workers.
Currently, broadcast parts are identified according to some rigid and hard-coded rules. This will limit the broadcast opportunities in many cases.
So, in this issue, I propose refactoring the broadcast planner to be more general.
Here are brief rules for broadcast join plan.
- A relation node is broadcastable if its input size does not exceed the pre-defined threshold.
- Output of an execution block (EB) is broadcastable if its every input is broadcastable.
- Given an EB containing a join and its child EBs, those EBs can be merged into a single EB if at least one child EB's output is broadcastable.
- The total size of broadcast relations of an EB cannot exceed the pre-defined threshold.
- After merging EBs according to the first rule, the result EB may not satisfy the second rule. In this case, enforce repartition join for large relations to satisfy the second rule.
- For outer joins, preserved-row relations are not broadcastable to avoid input data duplication. That is, full outer join cannot be executed with broadcast join.
- Here is brief backgrounds for this rule. Data of preserved-row relations will be appeared in the join result regardless of join conditions. If multiple tasks execute outer join with broadcasted preserved-row relations, they emit duplicates results.
- Even though a single task can execute outer join when every input is broadcastable, broadcast join is not allowed if one of input relation consists of multiple files.