Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27505

autoBroadcastJoinThreshold including bigger table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.1
    • None
    • PySpark
    • Hive table with Spark 2.3.1 on Azure, using Azure storage as storage layer

    Description

      I'm on a case that when certain table being exposed to broadcast join, the query will eventually failed with remote block error. 
       
      Firstly. We set the spark.sql.autoBroadcastJoinThreshold to 10MB, namely 10485760

       
      Then we proceed to perform query. In the SQL plan, we found that one table that is 25MB in size is broadcast as well.
       

       
      Also in desc extended the table is 24452111 bytes. It is a Hive table. We always ran into error when this table being broadcast. Below is the sample error
       
      Caused by: java.io.IOException: org.apache.spark.SparkException: corrupt remote block broadcast_477_piece0 of broadcast_477: 298778625 != -992055931 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1350) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)

       
      Also attached the physical plan if you're interested. One thing to note that, if I turn down autoBroadcastJoinThresholdto 5MB, this query will get successfully executed and default.product NOT broadcasted.
      However, when I change to another query that querying even less columns than pervious one, even in 5MB this table still get broadcasted and failed with the same error. I even changed to 1MB and still the same.

      Attachments

        1. explain_plan.txt
          11 kB
          Mike Chan

        Activity

          People

            Unassigned Unassigned
            magnusfa Mike Chan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: