Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-5626 Track and Address Flaky tests
  3. HDDS-11352

Intermittent Raft Log Corruption in TestOzoneManagerHAWithStoppedNodes

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • None
    • Ozone Manager
    • None

    Description

      Failure observed in this run in TestOzoneManagerHAWithStoppedNodes#testListVolumes, but may not be specific to that test in particular.

      -------------------------------------------------------------------------------
      Test set: org.apache.hadoop.ozone.om.TestOzoneManagerHAWithStoppedNodes
      -------------------------------------------------------------------------------
      Tests run: 12, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 621.712 s <<< FAILURE! - in org.apache.hadoop.ozone.om.TestOzoneManagerHAWithStoppedNodes
      org.apache.hadoop.ozone.om.TestOzoneManagerHAWithStoppedNodes.twoOMDown  Time elapsed: 18.461 s  <<< ERROR!
      java.util.concurrent.CompletionException: java.lang.IllegalStateException: omNode-1@group-523986131536: Failed to initRaftLog.
      	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
      	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
      	at java.base/java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1498)
      	at java.base/java.util.concurrent.CompletableFuture$CoCompletion.tryFire(CompletableFuture.java:1219)
      	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
      	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
      	at org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:206)
      	at org.apache.ratis.util.ConcurrentUtils.lambda$null$4(ConcurrentUtils.java:182)
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
      	at java.base/java.lang.Thread.run(Thread.java:840)
      Caused by: java.lang.IllegalStateException: omNode-1@group-523986131536: Failed to initRaftLog.
      	at org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:171)
      	at org.apache.ratis.server.impl.ServerState.lambda$new$6(ServerState.java:131)
      	at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:63)
      	at org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:148)
      	at org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:385)
      	at org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:203)
      	... 4 more
      Caused by: org.apache.ratis.protocol.exceptions.ChecksumException: Log entry corrupted: Calculated checksum is 3AB532B2 but read checksum is 31120F6C.
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:319)
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:204)
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:131)
      	at org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:138)
      	at org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:172)
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:428)
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:258)
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:231)
      	at org.apache.ratis.server.raftlog.RaftLogBase.open(RaftLogBase.java:273)
      	at org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:194)
      	at org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:168)
      	... 9 more
      
      org.apache.hadoop.ozone.om.TestOzoneManagerHAWithStoppedNodes.testListVolumes  Time elapsed: 121.075 s  <<< ERROR!
      

      Attachments

        1. it-om.zip
          1.14 MB
          Ethan Rose

        Issue Links

          Activity

            People

              Unassigned Unassigned
              erose Ethan Rose
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: