Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.5.3
-
None
-
None
Description
The default value for spark.sql.autoBroadcastJoinThreshold is 10m. But user can override it during spark-submit. User can provide any value, there is no validation.
for ex: user can pass 16g. During the execution, we validate if the data size is <8g (code refernece
If the data size is >8g, the job fails with an exception "
Cannot broadcast the table that is larger than...".
Instead of failing at runtime, the improvement suggested in this Jira is to set the autoBroadcastJoinThreshold to 8GB if the passed value is >8GB.