[SPARK-27505] autoBroadcastJoinThreshold including bigger table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.3.1
Fix Version/s: None
Component/s: PySpark
Labels:
- bulk-closed
Environment:

Hive table with Spark 2.3.1 on Azure, using Azure storage as storage layer

Description

I'm on a case that when certain table being exposed to broadcast join, the query will eventually failed with remote block error.

Firstly. We set the spark.sql.autoBroadcastJoinThreshold to 10MB, namely 10485760

Then we proceed to perform query. In the SQL plan, we found that one table that is 25MB in size is broadcast as well.

Also in desc extended the table is 24452111 bytes. It is a Hive table. We always ran into error when this table being broadcast. Below is the sample error

Caused by: java.io.IOException: org.apache.spark.SparkException: corrupt remote block broadcast_477_piece0 of broadcast_477: 298778625 != -992055931 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1350) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)

Also attached the physical plan if you're interested. One thing to note that, if I turn down autoBroadcastJoinThresholdto 5MB, this query will get successfully executed and default.product NOT broadcasted.
However, when I change to another query that querying even less columns than pervious one, even in 5MB this table still get broadcasted and failed with the same error. I even changed to 1MB and still the same.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

explain_plan.txt
18/Apr/19 10:30
11 kB
Mike Chan

Activity

People

Assignee:: Unassigned

Reporter:: Mike Chan

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 18/Apr/19 10:29

Updated:: 12/Dec/22 18:11

Resolved:: 25/May/21 01:42