Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.0.0-alpha1
-
None
-
Reviewed
Description
Currently when QJM is used, running BootstrapStandby while the existing NN is active can get the following exception:
FATAL ha.BootstrapStandby: Unable to read transaction ids 6175397-6175405 from the configured shared edits storage. Please copy these logs into the shared edits storage or call saveNamespace on the active node. Error: Gap in transactions. Expected to be able to read up until at least txid 6175405 but unable to find any edit logs containing txid 6175405 java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 6175405 but unable to find any edit logs containing txid 6175405 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1300) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1258) at org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.checkLogsAvailableForRead(BootstrapStandby.java:229)
Looks like the cause of the exception is that, when the active NN is queries by BootstrapStandby about the last written transaction ID, the in-progress edit log segment is included. However, when journal nodes are asked about the last written transaction ID, in-progress edit log is excluded. This causes BootstrapStandby#checkLogsAvailableForRead to complain gaps.
To fix this, we can either let journal nodes take into account the in-progress editlog, or let active NN exclude the in-progress edit log segment.