Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.4.0
-
None
Description
Short description: Jobs on Spark on k8s fail when certain conditions are met: when using client mode with driver on a host external to K8s cluster and the task result size is greater than the value "maxDirectResultSize" and the executor blockmanagers in the k8s cluster are not routable from the driver.
Background: when the result of a task has to be serialized back to the driver three cases are posible: (1) failure if the result size is greater than MAX_RESULT_SIZE, (2) the result is serialized directly to the driver if the size is equal or less than "maxDirectResultSize", (3) pointers to the task results in the block manager are sent back to the driver in the case of result size between "maxDirectResultSize" and MAX_RESULT_SIZE.
For behavior 3 to successfully complete (i.e. the driver is sent pointers into the block manager rather than data), the driver needs to be able to access the block manager address from in the executor JVM. This currently fails in our configuration when using a Spark driver outside the K8S cluster, as the block manager addresses are private K8S pods addresses and thus not routable from the driver, at least in our configuration, but we think others may be affected.
How to reproduce the issue:
- Use Spark on K8S with a driver located externally to the K8S cluster
- Start the Spark session using --conf spark.task.maxDirectResultSize =<testValue>, for example with <testValue> = 100 to make this problem appear with most tasks even returning small result size.
Error stack:
ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocksjava.io.IOException: Connecting to /10.100.112.3:35973 timed out (30000 ms)
Existing workaround:
- Set spark.task.maxDirectResultSize to the same value as spark.driver.maxResultSize when running in the configuration affected by this bug (K8S with driver external to the cluster).
Additional notes:
- The executors are registered with routable public ips but the executor blockmanagers are registered with ips private to K8S, that is why the task assignment works however driver cannot connect to executor blockmanager to pull the final result (if result is bigger than maxDirectResultSize).
- Spark documentation correctly states "spark executors must be able to connect to the Spark driver over a hostname and a port that is routable from the Spark executors”, however, this case is about the Spark driver needing to connect to the executors’ block managers.