Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Snapshot chain corruption causing OM snapshot failure.
2024-03-14 04:07:13,615 ERROR [OM StateMachine ApplyTransaction Thread - 0]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Terminating with exit status 1: OM Ratis Server has received unrecoverable error, to avoid further DB corruption, terminating OM. Error Response received is:cmdType: CreateSnapshot traceID: "" success: false message: "java.io.IOException: Snapshot chain is corrupted.\n\tat org.apache.hadoop.ozone.om.SnapshotChainManager.validateSnapshotChain(SnapshotChainManager.java:550)\n\tat org.apache.hadoop.ozone.om.SnapshotChainManager.getLatestPathSnapshotId(SnapshotChainManager.java:378)\n\tat org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest.addSnapshotInfoToSnapshotChainAndCache(OMSnapshotCreateRequest.java:232)\n\tat org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest.validateAndUpdateCache(OMSnapshotCreateRequest.java:162)\n\tat org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:378)\n\tat org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:568)\n\tat org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:363)\n\tat java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n" status: INTERNAL_ERRORINTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: java.io.IOException: Snapshot chain is corrupted. at org.apache.hadoop.ozone.om.SnapshotChainManager.validateSnapshotChain(SnapshotChainManager.java:550) at org.apache.hadoop.ozone.om.SnapshotChainManager.getLatestPathSnapshotId(SnapshotChainManager.java:378) at org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest.addSnapshotInfoToSnapshotChainAndCache(OMSnapshotCreateRequest.java:232) at org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest.validateAndUpdateCache(OMSnapshotCreateRequest.java:162) at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:378) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:568) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:363) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.terminate(OzoneManagerStateMachine.java:404) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$2(OzoneManagerStateMachine.java:379) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1609) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Attachments
Issue Links
- is duplicated by
-
HDDS-10548 [snapshot-LR] OM is getting shutdown due to Snapshot chain corruption in an LR setup
- Resolved
- links to