Right now the YARN shuffle service will swallow errors that happen during startup and just log them:
This causes two undesirable things to happen:
- because blockHandler will remain null when an error happens, every request to the shuffle service will cause an NPE
- because the NM is running, containers may be assigned to that host, only to fail to register with the shuffle service.
Example of the first:
Example of the second: