Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Cannot Reproduce
-
None
-
None
-
None
Description
When writing a checkpoint, an OOM error is thrown. But the JM is not failed and is restored because I found a log "No master state to restore".
Then JM never makes checkpoints anymore. Currently, the root cause is not that clear, maybe this is a bug and we should deal with the OOM or other exceptions when making checkpoints.