Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Duplicate
-
0.3.0
-
None
-
None
Description
steps taken :
------------------
- put few keys using ozonefs.
- stopped all services of the cluster.
- started om and scm.
- After sometime , started datanodes.
All datanodes failed to start . Out of 12 datanodes, 4 datanodes failed to start.
Here is the datanode log snippet :
------------------------------------------------
2018-10-24 04:49:30,594 ERROR org.apache.ratis.server.impl.StateMachineUpdater: Terminating with exit status 2: StateMachineUpdater-9524f4e2-9031-4852-ab7c-11c2da3460db: the StateMachineUpdater hits Throwable org.apache.ratis.server.storage.RaftLogIOException: java.io.IOException: Premature EOF from inputStream at org.apache.ratis.server.storage.LogSegment.loadCache(LogSegment.java:299) at org.apache.ratis.server.storage.SegmentedRaftLog.get(SegmentedRaftLog.java:192) at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:142) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Premature EOF from inputStream at org.apache.ratis.util.IOUtils.readFully(IOUtils.java:100) at org.apache.ratis.server.storage.LogReader.decodeEntry(LogReader.java:250) at org.apache.ratis.server.storage.LogReader.readEntry(LogReader.java:155) at org.apache.ratis.server.storage.LogInputStream.nextEntry(LogInputStream.java:128) at org.apache.ratis.server.storage.LogSegment.readSegmentFile(LogSegment.java:110) at org.apache.ratis.server.storage.LogSegment.access$400(LogSegment.java:43) at org.apache.ratis.server.storage.LogSegment$LogEntryLoader.load(LogSegment.java:167) at org.apache.ratis.server.storage.LogSegment$LogEntryLoader.load(LogSegment.java:161) at org.apache.ratis.server.storage.LogSegment.loadCache(LogSegment.java:295) ... 3 more 2018-10-24 04:49:30,598 INFO org.apache.hadoop.ozone.HddsDatanodeService: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down HddsDatanodeService at ctr-e138-1518143905142-541661-01-000003.hwx.site/172.27.57.0 ************************************************************/ 2018-10-24 04:49:30,598 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Thread Interrupted waiting to refresh disk information: sleep interrupted