Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
1.6.1
-
None
Description
Currently the auto broadcasting done in SparkSQL is asynchronous and done at query planning time. If you have a large query with many broadcasts, this can end up creating a large amount of memory pressure/possible OOMs all at once when it actually isn't necessary.
The current workaround for these types of queries is to disable broadcast joins, which can be prohibitive performance wise. The proposal for this ticket is to allow a config point to toggle doing these broadcasts either eagerly/asynchronously or doing the broadcasts lazily at execution time.