Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-722

ozone datanodes failed to start on few nodes

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • 0.3.0
    • None
    • Ozone Datanode
    • None

    Description

      steps taken :

      ------------------

      1. put few keys using ozonefs.
      2. stopped all services of the cluster.
      3. started om and scm.
      4. After sometime , started datanodes.

      All datanodes failed to start . Out of 12 datanodes, 4 datanodes failed to start.

       

      Here is the datanode log snippet :

      ------------------------------------------------

       

      2018-10-24 04:49:30,594 ERROR org.apache.ratis.server.impl.StateMachineUpdater: Terminating with exit status 2: StateMachineUpdater-9524f4e2-9031-4852-ab7c-11c2da3460db: the StateMachineUpdater hits Throwable
      org.apache.ratis.server.storage.RaftLogIOException: java.io.IOException: Premature EOF from inputStream
       at org.apache.ratis.server.storage.LogSegment.loadCache(LogSegment.java:299)
       at org.apache.ratis.server.storage.SegmentedRaftLog.get(SegmentedRaftLog.java:192)
       at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:142)
       at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.IOException: Premature EOF from inputStream
       at org.apache.ratis.util.IOUtils.readFully(IOUtils.java:100)
       at org.apache.ratis.server.storage.LogReader.decodeEntry(LogReader.java:250)
       at org.apache.ratis.server.storage.LogReader.readEntry(LogReader.java:155)
       at org.apache.ratis.server.storage.LogInputStream.nextEntry(LogInputStream.java:128)
       at org.apache.ratis.server.storage.LogSegment.readSegmentFile(LogSegment.java:110)
       at org.apache.ratis.server.storage.LogSegment.access$400(LogSegment.java:43)
       at org.apache.ratis.server.storage.LogSegment$LogEntryLoader.load(LogSegment.java:167)
       at org.apache.ratis.server.storage.LogSegment$LogEntryLoader.load(LogSegment.java:161)
       at org.apache.ratis.server.storage.LogSegment.loadCache(LogSegment.java:295)
       ... 3 more
      2018-10-24 04:49:30,598 INFO org.apache.hadoop.ozone.HddsDatanodeService: SHUTDOWN_MSG:
      /************************************************************
      SHUTDOWN_MSG: Shutting down HddsDatanodeService at ctr-e138-1518143905142-541661-01-000003.hwx.site/172.27.57.0
      ************************************************************/
      2018-10-24 04:49:30,598 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Thread Interrupted waiting to refresh disk information: sleep interrupted
       
      

       

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            nilotpalnandi Nilotpal Nandi
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment