Details
-
Sub-task
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
HA branch (HDFS-1623)
-
None
-
None
-
Reviewed
Description
The new code in HDFS-1580 is causing an issue with selectInputStreams in the HA context. When the active is writing to the shared edits, selectInputStreams is called on the standby. This ends up calling journalSet.getInputStream but doesn't pass the inProgressOk=false flag. So, getInputStream ends up reading and validating the in-progress stream unnecessarily. Since the validation results are no longer properly cached, findMaxTransaction also re-validates the in-progress stream, and then breaks the corruption check in this code. The end result is a lot of errors like:
2011-12-30 16:45:02,521 ERROR namenode.FileJournalManager (FileJournalManager.java:getNumberOfTransactions(266)) - Gap in transactions, max txnid is 579, 0 txns from 578
2011-12-30 16:45:02,521 INFO ha.EditLogTailer (EditLogTailer.java:run(163)) - Got error, will try again.
java.io.IOException: No non-corrupt logs for txid 578
at org.apache.hadoop.hdfs.server.namenode.JournalSet.getInputStream(JournalSet.java:229)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1081)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:115)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$0(EditLogTailer.java:100)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:154)