Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-1031

Update ratis version to fix a DN restart Bug

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.4.0
    • None
    • None

    Description

      This is related to RATIS-460.

      When datanode is restarted, after ratis has taken a snapshot, we see below stack trace, and DN won't boot up. For more info refer RATIS-460

       

      java.io.IOException: java.lang.IllegalStateException: lastEntry = 72856=72856: [77969640-aad9-4678-813b-8fb35bd5f568:172.27.37.0:9858, 7c6ae4fe-7db5-4e97-a407-0a9edff70c2c:172.27.35.192:9858, add14303-ecdf-4aed-84b7-abc3152177f6:172.27.37.128:9858], old=null, lastEntry.index >= logIndex = 0
              at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
              at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
              at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
              at org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:283)
              at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:295)
              at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:427)
              at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:149)
              at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:165)
              at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:334)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.IllegalStateException: lastEntry = 72856=72856: [77969640-aad9-4678-813b-8fb35bd5f568:172.27.37.0:9858, 7c6ae4fe-7db5-4e97-a407-0a9edff70c2c:172.27.35.192:9858, add14303-ecdf-4aed-84b7-abc3152177f6:172.27.37.128:9858], old=null, lastEntry.index >= logIndex = 0
              at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:72)
              at org.apache.ratis.server.impl.ConfigurationManager.addConfiguration(ConfigurationManager.java:54)
              at org.apache.ratis.server.impl.ServerState.setRaftConf(ServerState.java:352)
              at org.apache.ratis.server.impl.ServerState.setRaftConf(ServerState.java:347)
              at org.apache.ratis.server.storage.RaftLog.lambda$open$6(RaftLog.java:237)
              at org.apache.ratis.server.storage.LogSegment.lambda$loadSegment$0(LogSegment.java:140)
              at org.apache.ratis.server.storage.LogSegment.readSegmentFile(LogSegment.java:121)
              at org.apache.ratis.server.storage.LogSegment.loadSegment(LogSegment.java:137)
              at org.apache.ratis.server.storage.RaftLogCache.loadSegment(RaftLogCache.java:272)
              at org.apache.ratis.server.storage.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:159)
              at org.apache.ratis.server.storage.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:129)
              at org.apache.ratis.server.storage.RaftLog.open(RaftLog.java:233)
              at org.apache.ratis.server.impl.ServerState.initLog(ServerState.java:191)
              at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:114)
              at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:103)
              at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:207)
              at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
              at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582)
              at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
              at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
              at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
              at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
      2019-01-29 01:43:41,137 [main] ERROR      - Exception in HddsDatanodeService.
      java.lang.NullPointerException
              at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.join(DatanodeStateMachine.java:363)
              at org.apache.hadoop.ozone.HddsDatanodeService.join(HddsDatanodeService.java:270)
              at org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:127)
      

       

      Attachments

        1. Screen Shot 2019-01-30 at 11.22.41 AM.png
          500 kB
          Bharat Viswanadham
        2. HDDS-1031.00.patch
          1 kB
          Bharat Viswanadham

        Activity

          People

            bharat Bharat Viswanadham
            bharat Bharat Viswanadham
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: