Uploaded image for project: 'Hadoop Distributed Data Store'
  1. Hadoop Distributed Data Store
  2. HDDS-4226

Cleanup OM snapshots left after a failed installSnapshot

    XMLWordPrintableJSON

    Details

      Description

      Ozonemanager tries to install the snapshot

      2020-09-09 22:07:14,830 [pool-144-thread-1] INFO  om.OzoneManager (OzoneManager.java:installCheckpoint(3159)) - Installing checkpoint with OMTransactionInfo 2#68754
      2020-09-09 22:07:14,831 [grpc-default-executor-50] INFO  impl.RaftServerImpl (RaftServerImpl.java:installSnapshot(1127)) - omNode-2@group-D62218D261DE: reply installSnapshot: omNode-1<-omNode-2#0:FAIL-t2,IN
      _PROGRESS
      

      It failed because of the issues from HDDS-4224.

      2020-09-09 22:07:14,831 [pool-144-thread-1] ERROR om.OzoneManager (OzoneManager.java:installSnapshotFromLeader(3141)) - Failed to install snapshot from Leader OM: {}
      java.lang.NullPointerException
              at org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3168)
              at org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3162)
              at org.apache.hadoop.ozone.om.OzoneManager.installSnapshotFromLeader(OzoneManager.java:3139)
              at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$notifyInstallSnapshotFromLeader$4(OzoneManagerStateMachine.java:372)
              at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      

       

      The checkpoint is left in the snapshot directory.

      ➜  chaos-2020-09-09-22-05-33-IST ls MiniOzoneClusterImpl-71baac34-2321-4756-ba1e-5834c5628047/omNode-2/ratis/snapshot/om.db-omNode-1-1599669
      om.db-omNode-1-1599669432684/  om.db-omNode-1-1599669451421/  om.db-omNode-1-1599669478149/  om.db-omNode-1-1599669504818/  om.db-omNode-1-1599669533577/  om.db-omNode-1-1599669566509/
      om.db-omNode-1-1599669433775/  om.db-omNode-1-1599669453030/  om.db-omNode-1-1599669480273/  om.db-omNode-1-1599669507385/  om.db-omNode-1-1599669535603/  om.db-omNode-1-1599669568325/
      om.db-omNode-1-1599669434867/  om.db-omNode-1-1599669454688/  om.db-omNode-1-1599669482206/  om.db-omNode-1-1599669509373/  om.db-omNode-1-1599669537716/  om.db-omNode-1-1599669570186/
      om.db-omNode-1-1599669435886/  om.db-omNode-1-1599669456346/  om.db-omNode-1-1599669484256/  om.db-omNode-1-1599669511241/  om.db-omNode-1-1599669540574/  om.db-omNode-1-1599669572150/
      om.db-omNode-1-1599669437199/  om.db-omNode-1-1599669458194/  om.db-omNode-1-1599669486200/  om.db-omNode-1-1599669513051/  om.db-omNode-1-1599669543136/  om.db-omNode-1-1599669574811/
      om.db-omNode-1-1599669438519/  om.db-omNode-1-1599669459992/  om.db-omNode-1-1599669487968/  om.db-omNode-1-1599669515343/  om.db-omNode-1-1599669546272/  om.db-omNode-1-1599669576833/
      om.db-omNode-1-1599669439819/  om.db-omNode-1-1599669461897/  om.db-omNode-1-1599669490218/  om.db-omNode-1-1599669517332/  om.db-omNode-1-1599669548363/  om.db-omNode-1-1599669578680/
      om.db-omNode-1-1599669441209/  om.db-omNode-1-1599669463871/  om.db-omNode-1-1599669492005/  om.db-omNode-1-1599669519320/  om.db-omNode-1-1599669551596/  om.db-omNode-1-1599669580427/
      om.db-omNode-1-1599669442606/  om.db-omNode-1-1599669465810/  om.db-omNode-1-1599669493727/  om.db-omNode-1-1599669521491/  om.db-omNode-1-1599669554153/  om.db-omNode-1-1599669582124/
      om.db-omNode-1-1599669443967/  om.db-omNode-1-1599669467909/  om.db-omNode-1-1599669495587/  om.db-omNode-1-1599669523436/  om.db-omNode-1-1599669556370/  om.db-omNode-1-1599669583768/
      om.db-omNode-1-1599669445468/  om.db-omNode-1-1599669470054/  om.db-omNode-1-1599669497445/  om.db-omNode-1-1599669525567/  om.db-omNode-1-1599669558461/  om.db-omNode-1-1599669585501/
      om.db-omNode-1-1599669446937/  om.db-omNode-1-1599669472125/  om.db-omNode-1-1599669499362/  om.db-omNode-1-1599669527648/  om.db-omNode-1-1599669560578/
      om.db-omNode-1-1599669448360/  om.db-omNode-1-1599669474051/  om.db-omNode-1-1599669501269/  om.db-omNode-1-1599669529648/  om.db-omNode-1-1599669562666/
      om.db-omNode-1-1599669449867/  om.db-omNode-1-1599669476078/  om.db-omNode-1-1599669503036/  om.db-omNode-1-1599669531573/  om.db-omNode-1-1599669564620/ 

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              msingh Mukul Kumar Singh
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: