Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-8530

[snapshot] OM crash on restart due to Snapshot Chain corruption

    XMLWordPrintableJSON

Details

    Description

      snapshotTable is sorted lexicographically and assumption that previous snapshot always exist is wrong

      OM error stacktrace -

      2023-05-03 18:59:48,889 [main] INFO org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't find SST '032458'
      2023-05-03 18:59:48,889 [main] INFO org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't find SST '032459'
      2023-05-03 18:59:48,890 [main] INFO org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't find SST '032466'
      2023-05-03 18:59:48,890 [main] INFO org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't find SST '032475'
      2023-05-03 18:59:48,890 [main] INFO org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't find SST '032480'
      2023-05-03 18:59:48,890 [main] INFO org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't find SST '032482'
      2023-05-03 18:59:48,891 [main] INFO org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't find SST '032491'
      2023-05-03 18:59:48,891 [main] INFO org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't find SST '032495'
      2023-05-03 18:59:48,891 [main] INFO org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't find SST '032497'
      2023-05-03 18:59:48,891 [main] INFO org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't find SST '032501'
      2023-05-03 18:59:48,891 [main] INFO org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't find SST '032506'
      2023-05-03 18:59:48,892 [main] INFO org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't find SST '032517'
      2023-05-03 18:59:48,892 [main] INFO org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Can't find SST '032525'
      2023-05-03 18:59:49,270 [main] ERROR org.apache.hadoop.ozone.om.OzoneManagerStarter: OM start failed with exception
      java.io.IOException: Snapshot Chain corruption: previous snapshotID given but no associated snapshot found in snapshot chain: SnapshotID 9384de9d-3e6e-4f18-b4dd-64e69a58f31e
          at org.apache.hadoop.ozone.om.SnapshotChainManager.addSnapshotGlobal(SnapshotChainManager.java:86)
          at org.apache.hadoop.ozone.om.SnapshotChainManager.addSnapshot(SnapshotChainManager.java:289)
          at org.apache.hadoop.ozone.om.SnapshotChainManager.loadFromSnapshotInfoTable(SnapshotChainManager.java:279)
          at org.apache.hadoop.ozone.om.SnapshotChainManager.<init>(SnapshotChainManager.java:63)
          at org.apache.hadoop.ozone.om.OmMetadataManagerImpl.start(OmMetadataManagerImpl.java:517)
          at org.apache.hadoop.ozone.om.OmMetadataManagerImpl.<init>(OmMetadataManagerImpl.java:321)
          at org.apache.hadoop.ozone.om.OzoneManager.instantiateServices(OzoneManager.java:762)
          at org.apache.hadoop.ozone.om.OzoneManager.<init>(OzoneManager.java:642)
          at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:727)
          at org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:189)
          at org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:86)
          at org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:74)
          at org.apache.hadoop.hdds.cli.GenericCli.call(GenericCli.java:38)
          at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
          at picocli.CommandLine.access$1300(CommandLine.java:145)
          at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
          at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
          at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
          at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
          at picocli.CommandLine.execute(CommandLine.java:2078)
          at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:100)
          at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:91)
          at org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:58)
      2023-05-03 18:59:49,273 [shutdown-hook-0] INFO org.apache.hadoop.ozone.om.OzoneManagerStarter: SHUTDOWN_MSG:  

      Attachments

        1. Screenshot 2023-05-05 at 5.20.22 PM.png
          635 kB
          Hemant Kumar
        2. Screenshot 2023-05-05 at 5.51.26 PM.png
          681 kB
          Hemant Kumar

        Issue Links

          Activity

            People

              hemantk Hemant Kumar
              jyosin Jyotirmoy Sinha
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: