Details
Description
Hello Spark team,
I recently found a possible bug in Spark YarnAllocator.
Basically when I try to run Spark applications on YARN with Docker bridge network, the job failed with binding address error at Executor side.
I believe it is caused by the YarnAllocator implementation in Spark, the executor is trying to bind the hostname of the NodeManager instead of the hostname of the container. In host network it's fine but bridge network will break.
For more details please checkout RCA - Spark + YARN Docker Bridge Network.
It looks like YARN Container API does not return the container hostname related information, which mean to solve this issue, we may also need to make changes at Hadoop YARN side?
Please let me know if you have any questions, many thanks!
—
Best Regards,
Jingwei Zhang