Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-107

Data-nodes should be formatted when the name-node is formatted.

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.23.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The upgrade feature HADOOP-702 requires data-nodes to store persistently the namespaceID
      in their version files and verify during startup that it matches the one stored on the name-node.
      When the name-node reformats it generates a new namespaceID.
      Now if the cluster starts with the reformatted name-node, and not reformatted data-nodes
      the data-nodes will fail with
      java.io.IOException: Incompatible namespaceIDs ...

      Data-nodes should be reformatted whenever the name-node is. I see 2 approaches here:
      1) In order to reformat the cluster we call "start-dfs -format" or make a special script "format-dfs".
      This would format the cluster components all together. The question is whether it should start
      the cluster after formatting?
      2) Format the name-node only. When data-nodes connect to the name-node it will tell them to
      format their storage directories if it sees that the namespace is empty and its cTime=0.
      The drawback of this approach is that we can loose blocks of a data-node from another cluster
      if it connects by mistake to the empty name-node.

      1. HDFS-107-1.patch
        12 kB
        ramkrishna.s.vasudevan

        Activity

        Hide
        Stu Hood added a comment - - edited

        Does anyone have any thoughts on this issue? I've been getting "Incompatible namespaceID" errors on my datanodes after formatting with `bin/hadoop namenode format`. My current solution is to remove the hadoop-*-data directory on each datanode, but there ought to be a better way.

        Thanks.

        Show
        Stu Hood added a comment - - edited Does anyone have any thoughts on this issue? I've been getting "Incompatible namespaceID" errors on my datanodes after formatting with `bin/hadoop namenode format`. My current solution is to remove the hadoop-*-data directory on each datanode, but there ought to be a better way. Thanks.
        Hide
        Jared Stehler added a comment -

        I have a more elegant work-around which doesn't involve deleting the data folders: edit the <hadoop-data-root>/dfs/data/current/VERSION file, changing the namespaceID to match the current namenode:

        [jstehler@server19 ~]$ cat /lv_main/hadoop/dfs/data/current/VERSION
        #Fri Aug 01 18:40:43 UTC 2008
        namespaceID=292609117
        storageID=DS-1525930547-66.135.42.149-50010-1217002151282
        cTime=0
        storageType=DATA_NODE
        layoutVersion=-11

        This allowed me to bring up the slave datanode and have it recognized by the namenode in the DFS UI.

        Show
        Jared Stehler added a comment - I have a more elegant work-around which doesn't involve deleting the data folders: edit the <hadoop-data-root>/dfs/data/current/VERSION file, changing the namespaceID to match the current namenode: [jstehler@server19 ~] $ cat /lv_main/hadoop/dfs/data/current/VERSION #Fri Aug 01 18:40:43 UTC 2008 namespaceID=292609117 storageID=DS-1525930547-66.135.42.149-50010-1217002151282 cTime=0 storageType=DATA_NODE layoutVersion=-11 This allowed me to bring up the slave datanode and have it recognized by the namenode in the DFS UI.
        Hide
        Roman Valls added a comment -

        I can confirm the bug, upgrading from 0.17 to 0.18.1 did not work. I even deleted the HDFS on nodes and formatted (I'm running a small 9-node cluster):

        $ bin/hadoop/stop-all && cluster-fork rm -rf /state/partition1/hdfs/hadoop/*
        $ hadoop namenode -format

        In addition, I tried a rough variant on Jared's solution that did not work either:

        $ cp /state/partition1/hdfs/hadoop/dfs/data/current/VERSION /shared/apps/VERSION
        $ cluster-fork cp -a /shared/apps/VERSION /state1/partition1/hdfs/hadoop/dfs/data/current/VERSION

        Is there a reliable way to make it work right away ? Can this VERSION file (or namespaceID) be forced to be equal on every node ?

        Show
        Roman Valls added a comment - I can confirm the bug, upgrading from 0.17 to 0.18.1 did not work. I even deleted the HDFS on nodes and formatted (I'm running a small 9-node cluster): $ bin/hadoop/stop-all && cluster-fork rm -rf /state/partition1/hdfs/hadoop/* $ hadoop namenode -format In addition, I tried a rough variant on Jared's solution that did not work either: $ cp /state/partition1/hdfs/hadoop/dfs/data/current/VERSION /shared/apps/VERSION $ cluster-fork cp -a /shared/apps/VERSION /state1/partition1/hdfs/hadoop/dfs/data/current/VERSION Is there a reliable way to make it work right away ? Can this VERSION file (or namespaceID) be forced to be equal on every node ?
        Hide
        Andrii Vozniuk added a comment - - edited

        Have had the same problem with version 0.19.0. On initial stage solved it deleting dfs.data.dir folders on the problematic datanodes and reformatting the namenode.

        Show
        Andrii Vozniuk added a comment - - edited Have had the same problem with version 0.19.0. On initial stage solved it deleting dfs.data.dir folders on the problematic datanodes and reformatting the namenode.
        Hide
        Ashutosh Chauhan added a comment -

        I saw this issue on our small 6-node cluster too. It took a while to identify the root cause of the problem. Symptoms were same as described here. In our case we have both 18 and 20 installed in our cluster, but we only run 20. A user saw the HDFS exception for their job, so they stopped 20 and thought of going back to 18 and tried to start it. And then they switched back to 20 again. In doing all this, version files of datanode and namenode got messed up and DNs n NN had different set of information in their version files. Apart from this peculiar usecase, as things are currently in hdfs, I think even one small misstep in upgrading the cluster can result in this bug, as is reported in previous comments. I think at the cluster startup time namenode and datanode should also exchange information contained in version file and in case of mismatch, they should reconcile the differences, potentially asking users input in case choices are not safe to make.

        There are few workarounds suggested in previous comments. Which one of these is recommended one?

        Show
        Ashutosh Chauhan added a comment - I saw this issue on our small 6-node cluster too. It took a while to identify the root cause of the problem. Symptoms were same as described here. In our case we have both 18 and 20 installed in our cluster, but we only run 20. A user saw the HDFS exception for their job, so they stopped 20 and thought of going back to 18 and tried to start it. And then they switched back to 20 again. In doing all this, version files of datanode and namenode got messed up and DNs n NN had different set of information in their version files. Apart from this peculiar usecase, as things are currently in hdfs, I think even one small misstep in upgrading the cluster can result in this bug, as is reported in previous comments. I think at the cluster startup time namenode and datanode should also exchange information contained in version file and in case of mismatch, they should reconcile the differences, potentially asking users input in case choices are not safe to make. There are few workarounds suggested in previous comments. Which one of these is recommended one?
        Hide
        Gokul added a comment -

        The second approach looks fine to me.

        I feel the datanode losing blocks when it connects to empty namenode mistakenly is not a drawaback at all.
        In the current scenario, even if a datanode mistakenly connects to another namenode, the probability of the namenode having the same blocks(of this datanode) in its blocksmap is very less. The namenode most times will invalidate the blocks..

        2) Format the name-node only. When data-nodes connect to the name-node it will tell them to
        format their storage directories if it sees that the namespace is empty and its cTime=0.
        The drawback of this approach is that we can loose blocks of a data-node from another cluster
        if it connects by mistake to the empty name-node.

        When the datanode starts(after the namenode is formatted and started), can we override the namespace ID of the datanode with with the new namespace ID of the namenode instead of throwing exception?

        Show
        Gokul added a comment - The second approach looks fine to me. I feel the datanode losing blocks when it connects to empty namenode mistakenly is not a drawaback at all. In the current scenario, even if a datanode mistakenly connects to another namenode, the probability of the namenode having the same blocks(of this datanode) in its blocksmap is very less. The namenode most times will invalidate the blocks.. 2) Format the name-node only. When data-nodes connect to the name-node it will tell them to format their storage directories if it sees that the namespace is empty and its cTime=0. The drawback of this approach is that we can loose blocks of a data-node from another cluster if it connects by mistake to the empty name-node. When the datanode starts(after the namenode is formatted and started), can we override the namespace ID of the datanode with with the new namespace ID of the namenode instead of throwing exception?
        Hide
        Koji Noguchi added a comment -

        The second approach looks fine to me.

        Second approach is way too scary for me. -1.

        Show
        Koji Noguchi added a comment - The second approach looks fine to me. Second approach is way too scary for me. -1.
        Hide
        Konstantin Shvachko added a comment -

        In the current scenario data-nodes cannot mistakenly connect to another name-node as they will have different namespaceIds, and therefore blocks cannot be invalidated. Approach (2) will break this, but only in case of cTime=0.

        Show
        Konstantin Shvachko added a comment - In the current scenario data-nodes cannot mistakenly connect to another name-node as they will have different namespaceIds, and therefore blocks cannot be invalidated. Approach (2) will break this, but only in case of cTime=0.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Can we provide a config parameter saying
        'datanode.format.required'
        If this value is set to true, whenever the DN starts we can update the DN namespace id
        with the NN namespace id.

        If the value is set to false then we can continue with the existing behvaiour.

        Kindly provide your comments.

        Show
        ramkrishna.s.vasudevan added a comment - Can we provide a config parameter saying 'datanode.format.required' If this value is set to true, whenever the DN starts we can update the DN namespace id with the NN namespace id. If the value is set to false then we can continue with the existing behvaiour. Kindly provide your comments.
        Hide
        Koji Noguchi added a comment -

        If the value is set to false then we can continue with the existing behavior.

        If it's configurable, I take back my -1.

        However, please understand my worry. It's ops/support nightmare when datanodes report to incorrect namenode and lose millions of blocks at once. We had one case like that when one of our ops followed Jared's 'elegant approach' comment...

        Show
        Koji Noguchi added a comment - If the value is set to false then we can continue with the existing behavior. If it's configurable, I take back my -1. However, please understand my worry. It's ops/support nightmare when datanodes report to incorrect namenode and lose millions of blocks at once. We had one case like that when one of our ops followed Jared's 'elegant approach' comment...
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Why not the first approach if the second approach may cause data loss?

        Show
        Tsz Wo Nicholas Sze added a comment - Why not the first approach if the second approach may cause data loss?
        Hide
        ramkrishna.s.vasudevan added a comment -

        Though the second approach has a drawback.
        The user has the option to configure if he wants the datanode to be formatted when namenode is formatted.
        If the property is not configured then the behaviour will be in the normal way.

        Show
        ramkrishna.s.vasudevan added a comment - Though the second approach has a drawback. The user has the option to configure if he wants the datanode to be formatted when namenode is formatted. If the property is not configured then the behaviour will be in the normal way.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12482053/HDFS-107-1.patch
        against trunk revision 1134170.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.cli.TestHDFSCLI
        org.apache.hadoop.hdfs.TestDFSStartupVersions

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/763//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/763//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/763//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482053/HDFS-107-1.patch against trunk revision 1134170. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestDFSStartupVersions +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/763//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/763//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/763//console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        I think adding another config here is unnecessary. What's the downside of adding a "-format" flag to the datanode, and having "start-dfs -format" pass it along?

        Show
        Todd Lipcon added a comment - I think adding another config here is unnecessary. What's the downside of adding a "-format" flag to the datanode, and having "start-dfs -format" pass it along?
        Hide
        Konstantin Shvachko added a comment -

        I agree with Todd and others. Option (1) seems to be the way to go.
        If you add the config parameter, you will need to distribute new hdfs-site.xml to all data-nodes before formatting. Instead you could have just removed the storage directories.

        Show
        Konstantin Shvachko added a comment - I agree with Todd and others. Option (1) seems to be the way to go. If you add the config parameter, you will need to distribute new hdfs-site.xml to all data-nodes before formatting. Instead you could have just removed the storage directories.
        Hide
        Uma Maheswara Rao G added a comment -

        I agree with you, But Removing complete storage directories will take good amount of time when huge number of blocks present. Instead we can just sync the namespeceIDs in DataNode startup based on the flag passed and let the blocks will be deleted asynchronously.

        Show
        Uma Maheswara Rao G added a comment - I agree with you, But Removing complete storage directories will take good amount of time when huge number of blocks present. Instead we can just sync the namespeceIDs in DataNode startup based on the flag passed and let the blocks will be deleted asynchronously.
        Hide
        Todd Lipcon added a comment -

        I think it would be better to move them the new storage dir's "toBeDeleted" directory (see HDFS-611)

        Show
        Todd Lipcon added a comment - I think it would be better to move them the new storage dir's "toBeDeleted" directory (see HDFS-611 )
        Hide
        Konstantin Shvachko added a comment -

        Uma, your approach doesn't work, if I understand it correctly. Block IDs are unique only within one cluster. If you change namespaceID on a DataNode the NN will treat that blocks as belonging to this cluster and can mix them up with those that were really created under the namespaceID.
        Why would you optimize the format operation anyways? People actually don't format large clusters. I've never heard of such thing. Data is too important. So the format operation is mostly useful for small test clusters.
        Option (1) gives an appropriate automation of manual removal of storage directories.

        Show
        Konstantin Shvachko added a comment - Uma, your approach doesn't work, if I understand it correctly. Block IDs are unique only within one cluster. If you change namespaceID on a DataNode the NN will treat that blocks as belonging to this cluster and can mix them up with those that were really created under the namespaceID. Why would you optimize the format operation anyways? People actually don't format large clusters. I've never heard of such thing. Data is too important. So the format operation is mostly useful for small test clusters. Option (1) gives an appropriate automation of manual removal of storage directories.
        Hide
        Harsh J added a comment -

        Reformatting a namenode comes with responsibilities right (how often do you format anyway, and why?)? Why can't we just leave it to the ops/users to clean their dirs up themselves, or switch to new dirnames?

        Adding such a thing, I feel, is potentially dangerous.

        Show
        Harsh J added a comment - Reformatting a namenode comes with responsibilities right (how often do you format anyway, and why?)? Why can't we just leave it to the ops/users to clean their dirs up themselves, or switch to new dirnames? Adding such a thing, I feel, is potentially dangerous.
        Hide
        Uma Maheswara Rao G added a comment -

        I agree with Konstantin. Allowing auto format may be dangerous.
        I am ok with Option(1) as part of this JIRA, it gives an appropriate automation of manual removal of storage directories.

        Show
        Uma Maheswara Rao G added a comment - I agree with Konstantin. Allowing auto format may be dangerous. I am ok with Option(1) as part of this JIRA, it gives an appropriate automation of manual removal of storage directories.

          People

          • Assignee:
            Unassigned
            Reporter:
            Konstantin Shvachko
          • Votes:
            13 Vote for this issue
            Watchers:
            24 Start watching this issue

            Dates

            • Created:
              Updated:

              Development