Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1362

Provide volume management functionality for DataNode

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.23.0
    • Fix Version/s: None
    • Component/s: datanode
    • Labels:
      None
    • Target Version/s:
    • Release Note:
      Hide
      Based on the reconfiguration framework provided by HADOOP-7001, enable reconfigure the dfs.datanode.data.dir and add new volumes into service.
      Show
      Based on the reconfiguration framework provided by HADOOP-7001 , enable reconfigure the dfs.datanode.data.dir and add new volumes into service.
    • Tags:
      datanode, volume, reconfigure

      Description

      The current management unit in Hadoop is a node, i.e. if a node failed, it will be kicked out and all the data on the node will be replicated.

      As almost all SATA controller support hotplug, we add a new command line interface to datanode, thus it can list, add or remove a volume online, which means we can change a disk without node decommission. Moreover, if the failed disk still readable and the node has enouth space, it can migrate data on the disks to other disks in the same node.

      A more detailed design document will be attached.

      The original version in our lab is implemented against 0.20 datanode directly, and is it better to implemented it in contrib? Or any other suggestion?

      1. Provide_volume_management_for_DN_v1.pdf
        45 kB
        Wang Xu
      2. HDFS-1362.txt
        46 kB
        Wang Xu
      3. HDFS-1362.4_w7001.txt
        13 kB
        Wang Xu
      4. DataNode Volume Refreshment in HDFS-1362.pdf
        33 kB
        Wang Xu
      5. HDFS-1362.5.patch
        18 kB
        Wang Xu
      6. HDFS-1362.6.patch
        19 kB
        Wang Xu
      7. HDFS-1362.7.patch
        18 kB
        Wang Xu
      8. HDFS-1362.8.patch
        20 kB
        Wang Xu

        Issue Links

          Activity

          Hide
          Wang Xu added a comment -

          A design document is attached

          Show
          Wang Xu added a comment - A design document is attached
          Hide
          Wang Xu added a comment -

          The initial patch for this function.

          Show
          Wang Xu added a comment - The initial patch for this function.
          Hide
          Todd Lipcon added a comment -

          We had a brief meeting this morning to discuss this JIRA. To summarize for the community:

          • Having the ability to add/remove volumes via RPC has the issue that the changes are not reflected in the config file, so we risk that an admin may add a volume but forgot to modify the config. The next time the cluster is restarted, the volume will be missing and cause problems.
          • We discussed that the primary use case for this feature is restoring a volume after it has failed. The other use case (adding a new volume to a DN that has not suffered any issues) is rather rare.
          • So, rather than providing add/list/remove APIs, we decided to simply add a "refresh" API. There were two options suggested here:
            1. Make use of the new HADOOP-7001 interface for reconfiguring daemons. In this case an admin could modify the config file to add new volumes, and then refresh the config to have the DN pick up new volumes, or re-add failed volumes. The potential issue here is that, even if the configuration has not changed, we still want the "refresh" to do something, so maybe this is not the right place.
            2. Add a new RPC and command line tool, something like "dfsadmin -restoreDNStorage <datanode IP:port>". This would not re-read the conf file, but rather just re-check any failed volumes to see if they are newly available. This could alternatively be triggered by a new DN servlet or something if it's simpler.
          • We also discussed pluggability (HDFS-1405). Tom and I were of the opinion that this feature is generally useful and don't see any compelling reason to make it a plugin. We should just improve FSDataset directly instead of extending it into a new java class.
          • Regarding the new feature of copying blocks from volume to volume in the case that one volume has gone read-only, we decided that we should defer this to a separate JIRA to be implemented after this is complete. That will make this one smaller and easier to review.
          Show
          Todd Lipcon added a comment - We had a brief meeting this morning to discuss this JIRA. To summarize for the community: Having the ability to add/remove volumes via RPC has the issue that the changes are not reflected in the config file, so we risk that an admin may add a volume but forgot to modify the config. The next time the cluster is restarted, the volume will be missing and cause problems. We discussed that the primary use case for this feature is restoring a volume after it has failed. The other use case (adding a new volume to a DN that has not suffered any issues) is rather rare. So, rather than providing add/list/remove APIs, we decided to simply add a "refresh" API. There were two options suggested here: 1. Make use of the new HADOOP-7001 interface for reconfiguring daemons. In this case an admin could modify the config file to add new volumes, and then refresh the config to have the DN pick up new volumes, or re-add failed volumes. The potential issue here is that, even if the configuration has not changed, we still want the "refresh" to do something, so maybe this is not the right place. 2. Add a new RPC and command line tool, something like "dfsadmin -restoreDNStorage <datanode IP:port>". This would not re-read the conf file, but rather just re-check any failed volumes to see if they are newly available. This could alternatively be triggered by a new DN servlet or something if it's simpler. We also discussed pluggability ( HDFS-1405 ). Tom and I were of the opinion that this feature is generally useful and don't see any compelling reason to make it a plugin. We should just improve FSDataset directly instead of extending it into a new java class. Regarding the new feature of copying blocks from volume to volume in the case that one volume has gone read-only, we decided that we should defer this to a separate JIRA to be implemented after this is complete. That will make this one smaller and easier to review.
          Hide
          Allen Wittenauer added a comment -

          Why datanode ip? If one supplies a hostname, will it resolve it anyway?

          Show
          Allen Wittenauer added a comment - Why datanode ip? If one supplies a hostname, will it resolve it anyway?
          Hide
          Todd Lipcon added a comment -

          sorry, yes, IP or hostname is fine

          Show
          Todd Lipcon added a comment - sorry, yes, IP or hostname is fine
          Hide
          Wang Xu added a comment -

          Hi Todd,

          If I update the patch based on HADOOP-7001, would ReconfigurableBase update the on-disk configuration file?

          Show
          Wang Xu added a comment - Hi Todd, If I update the patch based on HADOOP-7001 , would ReconfigurableBase update the on-disk configuration file?
          Hide
          Todd Lipcon added a comment -

          Hi Wang. The current version of HADOOP-7001 just lets the daemon re-read the on-disk configuration file. It doesn't re-write the configuration on disk. The assumption is that the admin would edit the configuration and then trigger reconfiguration using the servlet provided by that JIRA.

          Show
          Todd Lipcon added a comment - Hi Wang. The current version of HADOOP-7001 just lets the daemon re-read the on-disk configuration file. It doesn't re-write the configuration on disk. The assumption is that the admin would edit the configuration and then trigger reconfiguration using the servlet provided by that JIRA.
          Hide
          Wang Xu added a comment -

          Hi folks,

          I am working on a new version of this patch, and I want to add a method to FSDataset for verify whether a dir is in-service.

          Do you think I could add the method to FSDatasetInterface, or just in FSDataset?

          Does the Interface mean that only FSDataset is an implementation based on dirs and other implementation may based on other storage subsystem, thus I should limit my modification only in the scope of FSDataset?

          Show
          Wang Xu added a comment - Hi folks, I am working on a new version of this patch, and I want to add a method to FSDataset for verify whether a dir is in-service. Do you think I could add the method to FSDatasetInterface, or just in FSDataset? Does the Interface mean that only FSDataset is an implementation based on dirs and other implementation may based on other storage subsystem, thus I should limit my modification only in the scope of FSDataset?
          Hide
          Jerry Tang added a comment -

          I would suggest you go ahead to modify the FSDataset only. If there is enough interest to make it into the interface, it would be trivial to add the method signature into the FSDatasetInterface.

          Show
          Jerry Tang added a comment - I would suggest you go ahead to modify the FSDataset only. If there is enough interest to make it into the interface, it would be trivial to add the method signature into the FSDatasetInterface.
          Hide
          Wang Xu added a comment -

          Hi Jerry,

          This is what I am doing now

          Show
          Wang Xu added a comment - Hi Jerry, This is what I am doing now
          Hide
          Wang Xu added a comment -

          I updated the patch based on the reconfiguration mechanism provided by HADOOP-7001. And now it only tries to add the new dirs in configuration files and is triggered by configuration update.

          Show
          Wang Xu added a comment - I updated the patch based on the reconfiguration mechanism provided by HADOOP-7001 . And now it only tries to add the new dirs in configuration files and is triggered by configuration update.
          Hide
          Wang Xu added a comment -

          The refresh method is based on the reconfigurable mechanism by HADOOP-7001

          Show
          Wang Xu added a comment - The refresh method is based on the reconfigurable mechanism by HADOOP-7001
          Hide
          Wang Xu added a comment -

          The refresh method is based on the reconfigurable mechanism by HADOOP-7001

          Show
          Wang Xu added a comment - The refresh method is based on the reconfigurable mechanism by HADOOP-7001
          Hide
          Wang Xu added a comment -

          Hi folks,

          Do you think it should continue use the startup option when add new disk? or we should only simply load or format it.

          Moreover, is it a common case for you to load blocks from a new disk, or may I format all new inserted disks.

          Show
          Wang Xu added a comment - Hi folks, Do you think it should continue use the startup option when add new disk? or we should only simply load or format it. Moreover, is it a common case for you to load blocks from a new disk, or may I format all new inserted disks.
          Hide
          Eli Collins added a comment -

          Hey Wang,

          Is the high-level use case replacing a failed disk with a new one without restarting the datanode (so you don't need to fail existing operations on that datanode)? Is this a feature that's needed if you've configured the datanode to tolerate multiple volume failures (HDFS-457)?

          Thanks,
          Eli

          Show
          Eli Collins added a comment - Hey Wang, Is the high-level use case replacing a failed disk with a new one without restarting the datanode (so you don't need to fail existing operations on that datanode)? Is this a feature that's needed if you've configured the datanode to tolerate multiple volume failures ( HDFS-457 )? Thanks, Eli
          Hide
          Wang Xu added a comment -

          Hi Eli,

          This issue likes to be a consequence operation of HDFS-457. After having replaced failed disks or added new disks, this patch can be used to re-install them without restart datanode.

          Thanks,
          Wang Xu

          Show
          Wang Xu added a comment - Hi Eli, This issue likes to be a consequence operation of HDFS-457 . After having replaced failed disks or added new disks, this patch can be used to re-install them without restart datanode. Thanks, Wang Xu
          Hide
          Wang Xu added a comment -

          An update document for this issue

          Show
          Wang Xu added a comment - An update document for this issue
          Hide
          Wang Xu added a comment -

          Updated patch with updated trunk code and included test case

          Show
          Wang Xu added a comment - Updated patch with updated trunk code and included test case
          Hide
          Wang Xu added a comment -

          As HDFS-457 may remove failed volumes, this patch enables add the volumes back after having disk replaced.

          Show
          Wang Xu added a comment - As HDFS-457 may remove failed volumes, this patch enables add the volumes back after having disk replaced.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12471154/HDFS-1362.5.patch
          against trunk revision 1071023.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 22 javac compiler warnings (more than the trunk's current 21 warnings).

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.TestFileConcurrentReader

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/164//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/164//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/164//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12471154/HDFS-1362.5.patch against trunk revision 1071023. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 22 javac compiler warnings (more than the trunk's current 21 warnings). -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/164//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/164//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/164//console This message is automatically generated.
          Hide
          Wang Xu added a comment -

          for new patch upload

          Show
          Wang Xu added a comment - for new patch upload
          Hide
          Wang Xu added a comment -

          Fix sync warning from findbugs

          Show
          Wang Xu added a comment - Fix sync warning from findbugs
          Hide
          Wang Xu added a comment -

          Fixed concurrency warning reported by findbugs.

          The core test failures also exists in trunk builds, and looks like server configuration problem.

          I am not quite understand the hdfsproxy test failures, and I will continue investigating on it. Any hints are appreciated very much.

          Show
          Wang Xu added a comment - Fixed concurrency warning reported by findbugs. The core test failures also exists in trunk builds, and looks like server configuration problem. I am not quite understand the hdfsproxy test failures, and I will continue investigating on it. Any hints are appreciated very much.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12471161/HDFS-1362.6.patch
          against trunk revision 1071023.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 22 javac compiler warnings (more than the trunk's current 21 warnings).

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.TestFileConcurrentReader

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/165//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/165//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/165//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12471161/HDFS-1362.6.patch against trunk revision 1071023. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 22 javac compiler warnings (more than the trunk's current 21 warnings). +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/165//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/165//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/165//console This message is automatically generated.
          Hide
          Wang Xu added a comment -

          The failed core test is also failed in before test and trunk nightly test.

          And I am investing on the hdfsproxy failure.

          Show
          Wang Xu added a comment - The failed core test is also failed in before test and trunk nightly test. And I am investing on the hdfsproxy failure.
          Hide
          Wang Xu added a comment -

          for new patch update

          Show
          Wang Xu added a comment - for new patch update
          Hide
          Wang Xu added a comment -

          Fix javac warning

          Show
          Wang Xu added a comment - Fix javac warning
          Hide
          Wang Xu added a comment -

          Reattach to check the "ASF Grant"

          Show
          Wang Xu added a comment - Reattach to check the "ASF Grant"
          Hide
          Wang Xu added a comment -

          Fixed the javac warning. And the test-core and test-contrib failure seems not triggered by this patch, PreCommit-HDFS-Build/166 by HDFS-1629 also failed on these 4 cases.

          Show
          Wang Xu added a comment - Fixed the javac warning. And the test-core and test-contrib failure seems not triggered by this patch, PreCommit-HDFS-Build/166 by HDFS-1629 also failed on these 4 cases.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12471190/HDFS-1362.7.patch
          against trunk revision 1071023.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.TestFileConcurrentReader

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/167//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/167//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/167//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12471190/HDFS-1362.7.patch against trunk revision 1071023. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/167//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/167//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/167//console This message is automatically generated.
          Hide
          Wang Xu added a comment -

          The test case failures seems not related with this patch

          Show
          Wang Xu added a comment - The test case failures seems not related with this patch
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12471190/HDFS-1362.7.patch
          against trunk revision 1072023.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.TestFileConcurrentReader
          org.apache.hadoop.hdfs.TestLargeBlock

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/188//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/188//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/188//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12471190/HDFS-1362.7.patch against trunk revision 1072023. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.hdfs.TestLargeBlock -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/188//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/188//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/188//console This message is automatically generated.
          Hide
          Sanjay Radia added a comment -

          @Todd> Having the ability to add/remove volumes via RPC has the issue that the changes are not reflected in the config file ..... So, rather than providing add/list/remove APIs, we decided to simply add a "refresh" API.

          I agree with Todd's comment that we don't want to change the config or do things that are different from the config.
          There is a related Jiras on "removing" failed disk.
          There is also the additional complexity of formatting the drives and mounting them. Are you assuming that this has been done prior to the refresh?
          Does Redhat deal with hot pluggable disks?

          Show
          Sanjay Radia added a comment - @Todd> Having the ability to add/remove volumes via RPC has the issue that the changes are not reflected in the config file ..... So, rather than providing add/list/remove APIs, we decided to simply add a "refresh" API. I agree with Todd's comment that we don't want to change the config or do things that are different from the config. There is a related Jiras on "removing" failed disk. There is also the additional complexity of formatting the drives and mounting them. Are you assuming that this has been done prior to the refresh? Does Redhat deal with hot pluggable disks?
          Hide
          Sanjay Radia added a comment -

          I would like to better understand the use case for this:
          So the main use case of this is that one wants to hot plug a new drive without restarting the DN daemon. Correct?

          I just talked to my operations team and they told me that they will not hot replace individual drives - its too risky as an operator may replace the wrong drive and further they have doubts about how well the OS will deal with this.
          Further they point out that one has format and mount the volume and hence login on the machine.
          The mode our ops are planning for our 12 disk nodes is to wait till about 3 disks have failed then then decommission the DN and the replace the drives.

          Allen your thoughts on the use case for this feature?

          Show
          Sanjay Radia added a comment - I would like to better understand the use case for this: So the main use case of this is that one wants to hot plug a new drive without restarting the DN daemon. Correct? I just talked to my operations team and they told me that they will not hot replace individual drives - its too risky as an operator may replace the wrong drive and further they have doubts about how well the OS will deal with this. Further they point out that one has format and mount the volume and hence login on the machine. The mode our ops are planning for our 12 disk nodes is to wait till about 3 disks have failed then then decommission the DN and the replace the drives. Allen your thoughts on the use case for this feature?
          Hide
          Bharath Mundlapudi added a comment -

          Adding to Sanjay's comments. This functionality is Hadoop deployment specific. Meaning some use hot plug and some doesn't. Also, if any existing/future Hadoop deployments use disk striping, RAID-0, will this feature work?

          Show
          Bharath Mundlapudi added a comment - Adding to Sanjay's comments. This functionality is Hadoop deployment specific. Meaning some use hot plug and some doesn't. Also, if any existing/future Hadoop deployments use disk striping, RAID-0, will this feature work?
          Hide
          Todd Lipcon added a comment -

          I used to manage a fairly small cluster of storage heavy non-Hadoop nodes in a previous life. We used SATA hotswap pretty extensively there and it worked "fairly well". About one out of ten times "scsiadd" wouldn't find the new drives and we'd have to reboot the box.

          This was 4 years ago or so, so things are likely improved by now.

          Show
          Todd Lipcon added a comment - I used to manage a fairly small cluster of storage heavy non-Hadoop nodes in a previous life. We used SATA hotswap pretty extensively there and it worked "fairly well". About one out of ten times "scsiadd" wouldn't find the new drives and we'd have to reboot the box. This was 4 years ago or so, so things are likely improved by now.
          Hide
          Allen Wittenauer added a comment -

          I look at it like this:

          Is there a downside to supporting this functionality? Just because it is there doesn't mean one has to use it. It is trivial to come up with a practical use case (front-side serving HBase machines). [I'm avoiding the temptation to make a snide comment about Federation here. ;) ]

          It is also worth pointing out that other operating systems have better IO subsystems for hot swapping. So while it may not work on one particular config, that doesn't mean all of them are cursed.

          FWIW, RAID-0, mounting, newfs'ing, etc is irrelevant. If HDFS talked to raw disks this would matter; Hadoop talks to file systems so any hardware/software OS config would already need to be in place anyway. For small grids where you can do replacements, this functionality makes sense.

          Show
          Allen Wittenauer added a comment - I look at it like this: Is there a downside to supporting this functionality? Just because it is there doesn't mean one has to use it. It is trivial to come up with a practical use case (front-side serving HBase machines). [I'm avoiding the temptation to make a snide comment about Federation here. ;) ] It is also worth pointing out that other operating systems have better IO subsystems for hot swapping. So while it may not work on one particular config, that doesn't mean all of them are cursed. FWIW, RAID-0, mounting, newfs'ing, etc is irrelevant. If HDFS talked to raw disks this would matter; Hadoop talks to file systems so any hardware/software OS config would already need to be in place anyway. For small grids where you can do replacements, this functionality makes sense.
          Hide
          Bharath Mundlapudi added a comment -

          I have made some changes in Hadoop 0.20 version in making datanode more reliable w.r.t to disk failures. Please refer to umbrella Jira HADOOP-7123 and specifically HADOOP-7125 for datanode. This particular patch supplements HADOOP-7125 Jira. I will be porting these changes to trunk soon.

          I have couple of comments regarding this patch:
          1. When we add a new volume, should we do appropriate math to validVolsRequired member in FSDataSet?
          2. Typo: revoverTransitionAdditionalRead instead of recoverTransitionAdditionalRead?
          3. Is there a way to separate out common code from recoverTransitionRead and recoverTransitionAdditionalRead? Seems like most of the code is common in these two methods.

          Show
          Bharath Mundlapudi added a comment - I have made some changes in Hadoop 0.20 version in making datanode more reliable w.r.t to disk failures. Please refer to umbrella Jira HADOOP-7123 and specifically HADOOP-7125 for datanode. This particular patch supplements HADOOP-7125 Jira. I will be porting these changes to trunk soon. I have couple of comments regarding this patch: 1. When we add a new volume, should we do appropriate math to validVolsRequired member in FSDataSet? 2. Typo: revoverTransitionAdditionalRead instead of recoverTransitionAdditionalRead? 3. Is there a way to separate out common code from recoverTransitionRead and recoverTransitionAdditionalRead? Seems like most of the code is common in these two methods.
          Hide
          Wang Xu added a comment -

          @Sanjay,

          Most SATA controller support hotswap, and all SATA devices support it. (Ref the libata wiki on: https://ata.wiki.kernel.org )

          And for the operational issue. many servers have per-disk status LED, some of them could be programming. Thus the management system can identify the failed disk by it. Without a status identification, it's indeed hard for maintainers to find the right disks.

          My assumption is:

          1. manually change the disk.
          2. find new device and enable it, then make local fs on it, and then mount it and make essential dirs. This step could be done by external management system or manually.
          3. re-enable the disk in hadoop

          @Bharath,

          Thanks for the code review, the recoverTransitionRead and recoverTransitionAdditionalRead are almost the same except the end "writeAll" at the end. when we add additional disks, we should not writeAll(). Should we split the recoverTransitionRead into different parts and re-use them?

          Show
          Wang Xu added a comment - @Sanjay, Most SATA controller support hotswap, and all SATA devices support it. (Ref the libata wiki on: https://ata.wiki.kernel.org ) And for the operational issue. many servers have per-disk status LED, some of them could be programming. Thus the management system can identify the failed disk by it. Without a status identification, it's indeed hard for maintainers to find the right disks. My assumption is: manually change the disk. find new device and enable it, then make local fs on it, and then mount it and make essential dirs. This step could be done by external management system or manually. re-enable the disk in hadoop @Bharath, Thanks for the code review, the recoverTransitionRead and recoverTransitionAdditionalRead are almost the same except the end "writeAll" at the end. when we add additional disks, we should not writeAll(). Should we split the recoverTransitionRead into different parts and re-use them?
          Hide
          Sanjay Radia added a comment -

          Steps 1 and 2 are done by the operator and step 3 is:

          • invoke a referesh volumes command on the DNs that re-examines the the existing config.
            This Jira will add this new refresh volumes command.

          Correct?
          That sound fine.

          Show
          Sanjay Radia added a comment - Steps 1 and 2 are done by the operator and step 3 is: invoke a referesh volumes command on the DNs that re-examines the the existing config. This Jira will add this new refresh volumes command. Correct? That sound fine.
          Hide
          Wang Xu added a comment -

          update patch

          Show
          Wang Xu added a comment - update patch
          Hide
          Wang Xu added a comment -

          1. fix typo
          2. excerpt common codes in recoverTransitionRead and recoverTransitionAdditionalRead into a method: analyzeStorageDirs

          Show
          Wang Xu added a comment - 1. fix typo 2. excerpt common codes in recoverTransitionRead and recoverTransitionAdditionalRead into a method: analyzeStorageDirs
          Hide
          Wang Xu added a comment -

          fix typo indicated by Bharath, and eliminate the duplicate codes

          Show
          Wang Xu added a comment - fix typo indicated by Bharath, and eliminate the duplicate codes
          Hide
          Wang Xu added a comment -

          @Sanjay, Yes, that's exactly what this Jira did.

          Show
          Wang Xu added a comment - @Sanjay, Yes, that's exactly what this Jira did.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12474302/HDFS-1362.8.patch
          against trunk revision 1083958.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.TestFileConcurrentReader

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/279//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/279//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/279//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12474302/HDFS-1362.8.patch against trunk revision 1083958. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/279//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/279//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/279//console This message is automatically generated.
          Hide
          Wang Xu added a comment -

          The failed unit test is not introduced by this patch, and the "Too many open files" exception seems to be a configuration problem, isn't it?

          Show
          Wang Xu added a comment - The failed unit test is not introduced by this patch, and the "Too many open files" exception seems to be a configuration problem, isn't it?
          Hide
          Harsh J added a comment -

          @All - What was the consensus on this patch? Reading up the comments looks like all were OK?

          Show
          Harsh J added a comment - @All - What was the consensus on this patch? Reading up the comments looks like all were OK?
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Wang Xu,
          Are you planning to re-base the patch based on trunk?

          Canceling the patch as it no more applies to the trunk.

          Show
          Uma Maheswara Rao G added a comment - Hi Wang Xu, Are you planning to re-base the patch based on trunk? Canceling the patch as it no more applies to the trunk.
          Hide
          Wang Xu added a comment -

          Hi Uma,

          I will try that in days.

          Show
          Wang Xu added a comment - Hi Uma, I will try that in days.
          Hide
          Uma Maheswara Rao G added a comment -

          Thanks a lot Wang.

          Show
          Uma Maheswara Rao G added a comment - Thanks a lot Wang.
          Hide
          Uma Maheswara Rao G added a comment -

          Any update on this? - Thanks

          Show
          Uma Maheswara Rao G added a comment - Any update on this? - Thanks
          Hide
          Wang Xu added a comment -

          Hi Uma,

          I am on this, but I need a bit more time to track the update of trunk in the past year.

          I am sorry for not in office last week.

          I will try to finished it as soon as possible.

          Show
          Wang Xu added a comment - Hi Uma, I am on this, but I need a bit more time to track the update of trunk in the past year. I am sorry for not in office last week. I will try to finished it as soon as possible.
          Hide
          jiwan@taobao.com added a comment -

          hi, Wang Xu
          have you done some implementation about "if the failed disk still readable and the node has enouth space, it can migrate data on the disks to other disks in the same node."
          expect your reply. thanks

          Show
          jiwan@taobao.com added a comment - hi, Wang Xu have you done some implementation about "if the failed disk still readable and the node has enouth space, it can migrate data on the disks to other disks in the same node." expect your reply. thanks
          Hide
          jiwan@taobao.com added a comment -

          hi, all, who know the progress about this issue?

          Show
          jiwan@taobao.com added a comment - hi, all, who know the progress about this issue?
          Hide
          Wang Xu added a comment -

          hi jiwan,

          I know, I am still keeping look at this issue. But as current my job is not on hdfs, I cannot keep track the updates in time. Sorry for that.

          And for "if the failed disk still readable and the node has enouth space, it can migrate data on the disks to other disks in the same node.", it was done in the original version, but was removed in the posted version. Having discussed with some committer, we all thought it is better to make the patch smaller and cleaner.

          Thanks for your attention.

          Show
          Wang Xu added a comment - hi jiwan, I know, I am still keeping look at this issue. But as current my job is not on hdfs, I cannot keep track the updates in time. Sorry for that. And for "if the failed disk still readable and the node has enouth space, it can migrate data on the disks to other disks in the same node.", it was done in the original version, but was removed in the posted version. Having discussed with some committer, we all thought it is better to make the patch smaller and cleaner. Thanks for your attention.
          Hide
          jiwan@taobao.com added a comment -

          Wang Xu
          i see , thank you.

          Show
          jiwan@taobao.com added a comment - Wang Xu i see , thank you.
          Hide
          Xiaobogu added a comment -

          Hi,I would like to know about when will you merge this feature into the main version,thanks

          Show
          Xiaobogu added a comment - Hi,I would like to know about when will you merge this feature into the main version,thanks

            People

            • Assignee:
              Wang Xu
              Reporter:
              Wang Xu
            • Votes:
              0 Vote for this issue
              Watchers:
              35 Start watching this issue

              Dates

              • Created:
                Updated:

                Development