[SPARK-46343] Spark cannot support Docker bridge network in YARN - ASF JIRA

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 4.0.0, 3.5.1
Fix Version/s: None
Component/s: YARN
Labels:
None
Environment:

OS: Ubuntu 22.04.2 LTS

JDK Version: 1.8

Hadoop Version: 3.3.6

Spark Version: 3.5.1

Target Version/s:

4.0.0, 3.5.1
Language:
- JAVA

Description

Hello Spark team,

I recently found a possible bug in Spark YarnAllocator.

Basically when I try to run Spark applications on YARN with Docker bridge network, the job failed with binding address error at Executor side.

I believe it is caused by the YarnAllocator implementation in Spark, the executor is trying to bind the hostname of the NodeManager instead of the hostname of the container. In host network it's fine but bridge network will break.

For more details please checkout RCA - Spark + YARN Docker Bridge Network.

It looks like YARN Container API does not return the container hostname related information, which mean to solve this issue, we may also need to make changes at Hadoop YARN side?

Please let me know if you have any questions, many thanks!

—

Best Regards,

Jingwei Zhang

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

spark-yarn-failure-rca.png
09/Dec/23 19:38
157 kB
Jingwei (Sophie) Zhang

Activity

There are no comments yet on this issue.

People

Assignee:: Unassigned

Reporter:: Jingwei (Sophie) Zhang

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 09/Dec/23 19:35

Updated:: 09/Dec/23 19:48