This is reproducible with an application that uses less cores than what are available on the workers:
E.g. with 1 application with 1 executor, when the worker with the executor is killed, the application will not get another executor assigned even if there are enough resources in the cluster. This seems to be a regression, caused by https://github.com/apache/spark/commit/51de86baed0776304c6184f2c04b6303ef48df90#diff-ca694acef669f50f9b45ca0d32ab6f5a516270bb26b33c4abb704e2dc00a1a03 .
That causes an assertion error on the master because it get's an executorStateChange from 'RUNNING' to 'RUNNING' instead of 'FAILED':