Details
Description
The existing reportError method in YarnChild.java is responsible for handling exceptions during job execution. However, when the exception is due to the cluster storage capacity being exceeded, the method lacks sufficient logging, especially in cases where the job is not configured to fast fail. This can make it difficult for users to understand why a job did not fail immediately when the storage capacity was exceeded. The enhancement adds detailed logging to inform users about the configuration that prevents fast failure.
Expected Behavior:
When a ClusterStorageCapacityExceededException is encountered, the system should log whether the job is configured to fail fast. If fast fail is disabled, the log should advise users on how to enable it.
How-to-Fix:
We propose to expose such a relationship by logging.