Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
2.0.0
-
None
-
None
-
HDP 2.2.0.0 <= rollback <= 2.2.4.0
Description
After a failed stack upgrade of HDP 2.2.0 => 2.2.4 (AMBARI-10519) and subsequent rollback, Ambari 2.0 leaves one of the HDFS HA NameNodes in an inconsistent state:
2015-04-16 11:45:38,231 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(138)) - Start loading edits file http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd, http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd 2015-04-16 11:45:38,232 INFO namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd, http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd' to transaction ID 54367965 2015-04-16 11:45:38,232 INFO namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd' to transaction ID 54367965 2015-04-16 11:45:38,284 ERROR namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(238)) - Encountered exception on operation RollingUpgradeOp [START, time=1429181084342] org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/nn is in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback first. at org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295) 2015-04-16 11:45:39,111 FATAL ha.EditLogTailer (EditLogTailer.java:doWork(331)) - Unknown error encountered while tailing edits. Shutting down standby NN. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/nn is in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback first. at org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295) 2015-04-16 11:45:39,114 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1 2015-04-16 11:45:39,115 INFO namenode.NameNode (StringUtils.java:run(659)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at <custom_scrubbed>/<custom_scrubbed> ************************************************************/
The NameNode was shut down as a result, and after restarting it, it still doesn't work properly as doing ha admin failover commands return similar exceptions complaining about this inconsistent state, which should be visible in the NameNode logs I've uploaded.
Hari Sekhon
http://www.linkedin.com/in/harisekhon