Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-50349

Set max value for spark.sql.autoBroadcastJoinThreshold to 8gb if more than 8gb is passed

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.5.3
    • None
    • Spark Core, SQL
    • None

    Description

      The default value for spark.sql.autoBroadcastJoinThreshold is 10m. But user can override it during spark-submit. User can provide any value, there is no validation.

      for ex: user can pass 16g. During the execution, we validate if the data size is <8g (code refernece 

      If the data size is >8g, the job fails with an exception "
      Cannot broadcast the table that is larger than...".
       
      Instead of failing at runtime, the improvement suggested in this Jira is to set the autoBroadcastJoinThreshold to 8GB if the passed value is >8GB.

      Attachments

        Activity

          People

            Unassigned Unassigned
            lchari Lakshminarayana Chari
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: