Hadoop Common
  1. Hadoop Common
  2. HADOOP-5342

DataNodes do not start up because InconsistentFSStateException on just part of the disks in use

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 0.18.2
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      After restarting a cluster (including rebooting) the dfs got corrupted because many DataNodes did not start up, running into the following exception:

      2009-02-26 22:33:53,774 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory xxx is in an inconsistent state: version file in current directory is missing.
      at org.apache.hadoop.dfs.Storage$StorageDirectory.analyzeStorage(Storage.java:326)
      at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:105)
      at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:306)
      at org.apache.hadoop.dfs.DataNode.<init>(DataNode.java:223)
      at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:3030)
      at org.apache.hadoop.dfs.DataNode.instantiateDataNode(DataNode.java:2985)
      at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2993)
      at org.apache.hadoop.dfs.DataNode.main(DataNode.java:3115)

      This happens when using multiple disks with at least one previously marked as read-only, such that the storage version became out-dated, but after reboot it was mounted read-write, resulting in the DataNode not starting because of out-dated version.

      This is a big headache. If a DataNode has multiple disks of which at least one has the correct storage version then out-dated versions should not bring down the DataNode.

        Activity

        Christian Kunz created issue -
        Christian Kunz made changes -
        Field Original Value New Value
        Summary DataNodes do not start up when a previous version has not been cleaned up DataNodes do not start up because InconsistentFSStateException on just part of the disks in use
        Sameer Paranjpye made changes -
        Assignee Hairong Kuang [ hairong ]
        Priority Blocker [ 1 ] Critical [ 2 ]
        Hide
        Grady Laksmono added a comment -

        I'm also experiencing this issue, is there a quick solution for now? This happened to me once before, then I reformat the HDFS and it works fine, but this happened again, but this time I have files on my HDFS..

        10/08/08 21:33:01 INFO common.Storage: Storage directory /tmp/hadoop-grady/dfs/name does not exist.
        10/08/08 21:33:01 ERROR namenode.FSNamesystem: FSNamesystem initialization failed.
        org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-grady/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
        10/08/08 21:33:01 INFO ipc.Server: Stopping server on 9000
        10/08/08 21:33:01 ERROR namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-grady/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

        Show
        Grady Laksmono added a comment - I'm also experiencing this issue, is there a quick solution for now? This happened to me once before, then I reformat the HDFS and it works fine, but this happened again, but this time I have files on my HDFS.. 10/08/08 21:33:01 INFO common.Storage: Storage directory /tmp/hadoop-grady/dfs/name does not exist. 10/08/08 21:33:01 ERROR namenode.FSNamesystem: FSNamesystem initialization failed. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-grady/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) 10/08/08 21:33:01 INFO ipc.Server: Stopping server on 9000 10/08/08 21:33:01 ERROR namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-grady/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
        Hide
        ramkrishna.s.vasudevan added a comment -

        I would like to suggest
        Pls correct me if am wrong

        namespace id is getting updated immediately after one of the disks of the dfs.data.dir got updated.
        instead update the namespace id after parsing all the dfs.data.dir storage directories

        Show
        ramkrishna.s.vasudevan added a comment - I would like to suggest Pls correct me if am wrong namespace id is getting updated immediately after one of the disks of the dfs.data.dir got updated. instead update the namespace id after parsing all the dfs.data.dir storage directories

          People

          • Assignee:
            Hairong Kuang
            Reporter:
            Christian Kunz
          • Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development