Description
SPARK-33933 report a issue that in AQE, when the resources is limited, broadcast timeout could happened.
#31269 gives a partial fix by reorder newStages by class type to make sure BroadcastQueryState precede others when calling materialized(). However, it only guarantee that the order of task to be scheduled in normal circumstances, but, the guarantee is not strict since the submit of broadcast job and shuffle map job are in different thread.
So we need a completely fix to avoid the edge case triggering broadcast timeout.
Attachments
Issue Links
- is a clone of
-
SPARK-33933 Broadcast timeout happened unexpectedly in AQE
- Resolved
- relates to
-
SPARK-33933 Broadcast timeout happened unexpectedly in AQE
- Resolved
-
SPARK-36414 Disable timeout for BroadcastQueryStageExec in AQE
- Resolved
- links to