-
Type:
Sub-task
-
Status: Resolved
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: QuorumJournalManager (HDFS-3077)
-
Fix Version/s: QuorumJournalManager (HDFS-3077)
-
Component/s: None
-
Labels:None
-
Hadoop Flags:Reviewed
In doing some stress tests, I ran into an issue with failover if the current edit log segment written by the old active is large. With a 327MB log segment containing 6.4M transactions, the JN took ~11 seconds to read and validate it during the recovery step. This was longer than the 10 second timeout for createNewEpoch, which caused the recovery to fail.