Two main issues for large queries using broadcast shuffle
1. Lots of tasks communicate to same node for downloading shuffle data. So most of the time, single machine will be overloaded with requests.
2. Tasks pertaining to same job (in the same machine) downloads broadcast shuffle data redundantly. If the data can be copied to temp storage or ramfs, other tasks running in the same machine can refer to the local copy. Optimizing this would help when running multiple queries in parallel in the cluster.