[SPARK-36414] Disable timeout for BroadcastQueryStageExec in AQE - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.3, 3.1.2, 3.2.0, 3.3.0
Fix Version/s: 3.2.0
Component/s: SQL
Labels:
None

Description

This reverts ~~SPARK-31475~~, as there are always more concurrent jobs running in AQE mode, especially when running multiple queries at the same time. Currently, the broadcast timeout does not record accurately for the BroadcastQueryStageExec only but also the time waiting for being scheduled. If all the resources are currently being occupied for materializing other stages, it timeouts without a chance to run actually.

The default value is 300s, and it's hard to adjust the timeout for AQE mode. Usually, you need an extremely large number for real-world cases. As you can see the example, above, the timeout we used for it is 1800s, and obviously, it needs 3x more or something

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2021-08-04-18-53-44-879.png
04/Aug/21 10:53
217 kB
Kent Yao 2

Issue Links

is related to

SPARK-35414 Completely fix the broadcast timeout issue in AQE

Resolved

links to

[Github] Pull Request #33636 (yaooqinn)

Activity

People

Assignee:: Kent Yao 2

Reporter:: Kent Yao 2

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 04/Aug/21 10:53

Updated:: 10/Dec/21 08:12

Resolved:: 05/Aug/21 13:17