Spark's BlacklistTracker maintains a list of "bad nodes" which it will not use for running tasks (eg., because of bad hardware). When running in yarn, this blacklist is used to avoid ever allocating resources on blacklisted nodes: https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128
I'm just beginning to poke around the kubernetes code, so apologies if this is incorrect – but I didn't see any references to scheduler.nodeBlacklist() in KubernetesClusterSchedulerBackend so it seems this is missing. Thought of this while looking at
SPARK-19755, a similar issue on mesos.