Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4239

Means of telling the datanode to stop using a sick disk

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      If a disk has been deemed 'sick' – i.e. not dead but wounded, failing occasionally, or just exhibiting high latency – your choices are:

      1. Decommission the total datanode. If the datanode is carrying 6 or 12 disks of data, especially on a cluster that is smallish – 5 to 20 nodes – the rereplication of the downed datanode's data can be pretty disruptive, especially if the cluster is doing low latency serving: e.g. hosting an hbase cluster.

      2. Stop the datanode, unmount the bad disk, and restart the datanode (You can't unmount the disk while it is in use). This latter is better in that only the bad disk's data is rereplicated, not all datanode data.

      Is it possible to do better, say, send the datanode a signal to tell it stop using a disk an operator has designated 'bad'. This would be like option #2 above minus the need to stop and restart the datanode. Ideally the disk would become unmountable after a while.

      Nice to have would be being able to tell the datanode to restart using a disk after its been replaced.

      1. hdfs-4239_v2.patch
        41 kB
        Jimmy Xiang
      2. hdfs-4239_v3.patch
        42 kB
        Jimmy Xiang
      3. hdfs-4239_v4.patch
        48 kB
        Jimmy Xiang
      4. hdfs-4239_v5.patch
        48 kB
        Jimmy Xiang
      5. hdfs-4239.patch
        19 kB
        Jimmy Xiang

        Issue Links

          Activity

          Hide
          tlipcon Todd Lipcon added a comment -

          Would chmod 000 /path/to/data/dir do the trick? That should cause it to start getting IOExceptions, which would then get it to eject that disk from its list.

          Show
          tlipcon Todd Lipcon added a comment - Would chmod 000 /path/to/data/dir do the trick? That should cause it to start getting IOExceptions, which would then get it to eject that disk from its list.
          Hide
          stack stack added a comment -

          Let me try.

          Show
          stack stack added a comment - Let me try.
          Hide
          qwertymaniac Harsh J added a comment -

          Todd's proposal would still require the DN to be started up with a > 1 toleration value via dfs.datanode.failed.volumes.tolerated.

          Show
          qwertymaniac Harsh J added a comment - Todd's proposal would still require the DN to be started up with a > 1 toleration value via dfs.datanode.failed.volumes.tolerated .
          Hide
          qwertymaniac Harsh J added a comment -

          Typo, I meant >= 1

          Show
          qwertymaniac Harsh J added a comment - Typo, I meant >= 1
          Hide
          stevel@apache.org Steve Loughran added a comment -

          would a umount -f let you force the unmount.

          The big volume management JIRA is HDFS-1362 -that's the one that really needs finishing off.

          Show
          stevel@apache.org Steve Loughran added a comment - would a umount -f let you force the unmount. The big volume management JIRA is HDFS-1362 -that's the one that really needs finishing off.
          Hide
          adi2 Andy Isaacson added a comment -

          would a umount -f let you force the unmount.

          Unfortunately not while a running process still has a filedescriptor open on the volume. It would require revoke() support in the kernel, and that effort foundered many years ago. http://lwn.net/Articles/262528/

          Show
          adi2 Andy Isaacson added a comment - would a umount -f let you force the unmount. Unfortunately not while a running process still has a filedescriptor open on the volume. It would require revoke() support in the kernel, and that effort foundered many years ago. http://lwn.net/Articles/262528/
          Hide
          adi2 Andy Isaacson added a comment -

          Would chmod 000 /path/to/data/dir do the trick? That should cause it to start getting IOExceptions, which would then get it to eject that disk from its list.

          Unfortunately the DN keeps the in_use.lock open even after the volume is marked failed.

          Show
          adi2 Andy Isaacson added a comment - Would chmod 000 /path/to/data/dir do the trick? That should cause it to start getting IOExceptions, which would then get it to eject that disk from its list. Unfortunately the DN keeps the in_use.lock open even after the volume is marked failed.
          Hide
          adi2 Andy Isaacson added a comment -

          To expand on my previous comment:

          I tested on trunk, on a DN with dfs.datanode.failed.volumes.tolerated=1. Running chmod 0 /data/5/datadir/current caused the DN to eject the volume and continue operating. I then used lsof -p to verify what filedescriptors remained open and observed that /data/5/datadir/in_use.lock was still open.

          Show
          adi2 Andy Isaacson added a comment - To expand on my previous comment: I tested on trunk, on a DN with dfs.datanode.failed.volumes.tolerated=1 . Running chmod 0 /data/5/datadir/current caused the DN to eject the volume and continue operating. I then used lsof -p to verify what filedescriptors remained open and observed that /data/5/datadir/in_use.lock was still open.
          Hide
          tiborvass Tibor Vass added a comment -

          What about stopping the datanode, chmod 0-ing, and restarting the datanode?

          Show
          tiborvass Tibor Vass added a comment - What about stopping the datanode, chmod 0-ing, and restarting the datanode?
          Hide
          adi2 Andy Isaacson added a comment -

          What about stopping the datanode, chmod 0-ing, and restarting the datanode?

          That should work just fine, if the HDFS config is compatible with the new set of available directories. That means either ensuring that the inaccessible datadir does not exceed the dfs.datanode.failed.volumes.tolerated value, or removing the inaccessible datadir from dfs.data.dir.

          Show
          adi2 Andy Isaacson added a comment - What about stopping the datanode, chmod 0-ing, and restarting the datanode? That should work just fine, if the HDFS config is compatible with the new set of available directories. That means either ensuring that the inaccessible datadir does not exceed the dfs.datanode.failed.volumes.tolerated value, or removing the inaccessible datadir from dfs.data.dir .
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          That should work just fine, if the HDFS config is compatible with the new set of available directories.

          We tried it today, it worked fine. We did encounter an interesting problem tho, the region server on the same node continued to use that disk directly since it's configured with local reads.

          To rephrase that, a long running BlockReaderLocal will ride over local DN restarts and disk "ejections". We had to drain the RS of all its regions in order to stop it from using the bad disk.

          Show
          jdcryans Jean-Daniel Cryans added a comment - That should work just fine, if the HDFS config is compatible with the new set of available directories. We tried it today, it worked fine. We did encounter an interesting problem tho, the region server on the same node continued to use that disk directly since it's configured with local reads. To rephrase that, a long running BlockReaderLocal will ride over local DN restarts and disk "ejections". We had to drain the RS of all its regions in order to stop it from using the bad disk.
          Hide
          adi2 Andy Isaacson added a comment -

          I created HDFS-4284 to track the BlockReaderLocal issue.

          Show
          adi2 Andy Isaacson added a comment - I created HDFS-4284 to track the BlockReaderLocal issue.
          Hide
          qwertymaniac Harsh J added a comment -

          HDFS-1362 matches this JIRA's needs I think.

          Show
          qwertymaniac Harsh J added a comment - HDFS-1362 matches this JIRA's needs I think.
          Hide
          qwertymaniac Harsh J added a comment -

          I just noticed Steve's comment referring the same - should've gone through properly before spending google cycles. I feel HDFS-1362 implemented would solve half of this - and the other half would be to make the removals automatic. Right now the checkDiskError does not eject if its slow - as long as its succeed, which would have to be done via this JIRA I think. The re-add would be possible via HDFS-1362.

          Show
          qwertymaniac Harsh J added a comment - I just noticed Steve's comment referring the same - should've gone through properly before spending google cycles. I feel HDFS-1362 implemented would solve half of this - and the other half would be to make the removals automatic. Right now the checkDiskError does not eject if its slow - as long as its succeed, which would have to be done via this JIRA I think. The re-add would be possible via HDFS-1362 .
          Hide
          jxiang Jimmy Xiang added a comment -

          Attached a path for trunk. It's good for branch 2 too.

          Show
          jxiang Jimmy Xiang added a comment - Attached a path for trunk. It's good for branch 2 too.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12623271/hdfs-4239.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5891//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5891//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623271/hdfs-4239.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5891//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5891//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12623271/hdfs-4239.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5913//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5913//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623271/hdfs-4239.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5913//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5913//console This message is automatically generated.
          Hide
          stack stack added a comment -

          This is lovely Jimmy Xiang. The addition of the "+ System.err.println(" [-markDownVolume datanode-host:port location");" is particularly so. In your testing, did you see that after removal of the volume, if the volume's 'in_use.lock' was cleaned up? (See the comment by Andy above – https://issues.apache.org/jira/browse/HDFS-4239?focusedCommentId=13506791&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506791) Thanks.

          Show
          stack stack added a comment - This is lovely Jimmy Xiang . The addition of the "+ System.err.println(" [-markDownVolume datanode-host:port location");" is particularly so. In your testing, did you see that after removal of the volume, if the volume's 'in_use.lock' was cleaned up? (See the comment by Andy above – https://issues.apache.org/jira/browse/HDFS-4239?focusedCommentId=13506791&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506791 ) Thanks.
          Hide
          jxiang Jimmy Xiang added a comment -

          File 'in_use.lock' is still there after the volume is marked down. Let me take another look.

          Show
          jxiang Jimmy Xiang added a comment - File 'in_use.lock' is still there after the volume is marked down. Let me take another look.
          Hide
          jxiang Jimmy Xiang added a comment -

          We can release the lock after the volume is marked down. No new block will be allocated to this volume. How about those blocks on this volume being writing? The writing could take forever, for example, a rarely updated HLog file. I was thinking to fail the writing pipeline so that the client can set up another pipeline. Any problem with that?

          Show
          jxiang Jimmy Xiang added a comment - We can release the lock after the volume is marked down. No new block will be allocated to this volume. How about those blocks on this volume being writing? The writing could take forever, for example, a rarely updated HLog file. I was thinking to fail the writing pipeline so that the client can set up another pipeline. Any problem with that?
          Hide
          stack stack added a comment -

          I think throwing up an exception the right thing to do. The volume is going away at the operators volition.

          Show
          stack stack added a comment - I think throwing up an exception the right thing to do. The volume is going away at the operators volition.
          Hide
          jxiang Jimmy Xiang added a comment -

          Cool. I agree. Attached v2 that released all references to the volume marked down. In my test, I don't see any open file descriptor pointing to the volume marked down.

          Show
          jxiang Jimmy Xiang added a comment - Cool. I agree. Attached v2 that released all references to the volume marked down. In my test, I don't see any open file descriptor pointing to the volume marked down.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12625766/hdfs-4239_v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage
          org.apache.hadoop.hdfs.TestPread
          org.apache.hadoop.hdfs.TestReplication
          org.apache.hadoop.hdfs.TestSmallBlock
          org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
          org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics
          org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer
          org.apache.hadoop.hdfs.TestFileCreation
          org.apache.hadoop.hdfs.TestSetrepIncreasing
          org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
          org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
          org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
          org.apache.hadoop.hdfs.server.namenode.TestFileLimit
          org.apache.hadoop.hdfs.server.balancer.TestBalancer

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5983//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5983//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625766/hdfs-4239_v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage org.apache.hadoop.hdfs.TestPread org.apache.hadoop.hdfs.TestReplication org.apache.hadoop.hdfs.TestSmallBlock org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer org.apache.hadoop.hdfs.TestFileCreation org.apache.hadoop.hdfs.TestSetrepIncreasing org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS org.apache.hadoop.hdfs.server.namenode.TestFileLimit org.apache.hadoop.hdfs.server.balancer.TestBalancer +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5983//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5983//console This message is automatically generated.
          Hide
          jxiang Jimmy Xiang added a comment -

          Attached v3 that fixed the test failures.

          Show
          jxiang Jimmy Xiang added a comment - Attached v3 that fixed the test failures.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12626054/hdfs-4239_v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.TestAuditLogs

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5989//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5989//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626054/hdfs-4239_v3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestAuditLogs +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5989//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5989//console This message is automatically generated.
          Hide
          jxiang Jimmy Xiang added a comment -

          This test failure is not related.

          Show
          jxiang Jimmy Xiang added a comment - This test failure is not related.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12626054/hdfs-4239_v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestPersistBlocks

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6000//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6000//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626054/hdfs-4239_v3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestPersistBlocks +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6000//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6000//console This message is automatically generated.
          Hide
          tlipcon Todd Lipcon added a comment -

          Couple quick high-level comments:

          • what's the authorization requirement here? The patch doesn't seem to do any access control, but I wouldn't want a non-admin to make these changes.
          • it seems odd that the "mark this volume dead" is non-persistent across restarts. If a disk is "dying", I'm nervous that someone would mark it bad, and then a later rolling restart of the service would revive it. Something like a config file of "blacklisted volume IDs" and a 'refresh' RPC might be more resistant to this type of issue – or a marker file like "disallow_this_volume" in the storage directory?
          Show
          tlipcon Todd Lipcon added a comment - Couple quick high-level comments: what's the authorization requirement here? The patch doesn't seem to do any access control, but I wouldn't want a non-admin to make these changes. it seems odd that the "mark this volume dead" is non-persistent across restarts. If a disk is "dying", I'm nervous that someone would mark it bad, and then a later rolling restart of the service would revive it. Something like a config file of "blacklisted volume IDs" and a 'refresh' RPC might be more resistant to this type of issue – or a marker file like "disallow_this_volume" in the storage directory?
          Hide
          jxiang Jimmy Xiang added a comment -

          Good point. Let me handle the access control in next patch. As to "blacklisted volume IDs", can we handle it in a separate issue?

          Show
          jxiang Jimmy Xiang added a comment - Good point. Let me handle the access control in next patch. As to "blacklisted volume IDs", can we handle it in a separate issue?
          Hide
          jxiang Jimmy Xiang added a comment -

          Attached v4 that added access control if security is enabled.

          Show
          jxiang Jimmy Xiang added a comment - Attached v4 that added access control if security is enabled.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12626799/hdfs-4239_v4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestReplaceDatanodeOnFailure
          org.apache.hadoop.hdfs.server.namenode.TestAuditLogs

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6020//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6020//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626799/hdfs-4239_v4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 2 warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestReplaceDatanodeOnFailure org.apache.hadoop.hdfs.server.namenode.TestAuditLogs +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6020//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6020//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12626985/hdfs-4239_v5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6027//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6027//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626985/hdfs-4239_v5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 2 warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6027//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6027//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12626985/hdfs-4239_v5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6028//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6028//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626985/hdfs-4239_v5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 2 warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6028//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6028//console This message is automatically generated.
          Hide
          jxiang Jimmy Xiang added a comment -

          Ping. Can anyone take a look patch v4? Thanks.

          Show
          jxiang Jimmy Xiang added a comment - Ping. Can anyone take a look patch v4? Thanks.
          Hide
          yzhangal Yongjun Zhang added a comment -

          HI Jimmy,

          Thanks for the good work. I went through patch v4. It looks good to me. I only had a few comments, mostly cosmetic things and I may be wrong myself.

          1. In DataNode.java:
          private void checkSuperuserPrivilege(String method) throws IOException {
          if (checkKerberosAuthMethod(method))

          { ... }

          }

          The above function check super privilege only when kerberos authentication
          is enabled. This seems not restrictive enough to me. However, I saw existing code in same file also does that, such as:

          private void checkBlockLocalPathAccess() throws IOException

          { checkKerberosAuthMethod("getBlockLocalPathInfo()"); ... }

          So I'm not actually not sure. Please correct me if I'm wrong. Say, I found some other existing code that checks superuser privilege like

          ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          public void checkSuperuserPrivilege()

          Which seems to do thing differently.

          2. In DataNode.java:

          /** Ensure that authentication method is kerberos */
          boolean checkKerberosAuthMethod(String msg) throws IOException {

          Suggest to change (both comments and interface) to something like:
          /** Check whether authentication method is kerberos, return true

          • if so and false otherwise
            */
            boolean isKerberosAuthMethodEnabled(...)...

          3. In BlockPoolSliceScanner.java
          private static final String VERIFICATION_PREFIX = "dncp_block_verification.log";

          You removed "private" from the interface, I wonder if it's what you intended.
          Seems it should stay private.

          4. In DatablockScanner.java:

          void volumeMarkedDown(FsVolumeSpi vol) throws IOException {

          I wonder whether if we can change it to
          /**

          • relocate verification logs for volume that's marked down
          • ...
            */
            void relocateVerificationLogs(FsVolumeSpi volMarkedDown) ...

          to make it more clear?

          5. In BlockPoolSliceScanner.java,
          void relocateVerificationLogs(FsVolumeSpi vol) throws IOException {
          if (verificationLog != null)

          { // block of code }

          // no code here
          }

          If the block of code is large, it would be helpful to change
          it to
          void relocateVerificationLogs(FsVolumeSpi vol) throws IOException {
          if (verificationLog == null)

          { return; }

          // block of code
          }

          This helps removing one level of indentation, to make it easier to read.

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - HI Jimmy, Thanks for the good work. I went through patch v4. It looks good to me. I only had a few comments, mostly cosmetic things and I may be wrong myself. 1. In DataNode.java: private void checkSuperuserPrivilege(String method) throws IOException { if (checkKerberosAuthMethod(method)) { ... } } The above function check super privilege only when kerberos authentication is enabled. This seems not restrictive enough to me. However, I saw existing code in same file also does that, such as: private void checkBlockLocalPathAccess() throws IOException { checkKerberosAuthMethod("getBlockLocalPathInfo()"); ... } So I'm not actually not sure. Please correct me if I'm wrong. Say, I found some other existing code that checks superuser privilege like ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java public void checkSuperuserPrivilege() Which seems to do thing differently. 2. In DataNode.java: /** Ensure that authentication method is kerberos */ boolean checkKerberosAuthMethod(String msg) throws IOException { Suggest to change (both comments and interface) to something like: /** Check whether authentication method is kerberos, return true if so and false otherwise */ boolean isKerberosAuthMethodEnabled(...)... 3. In BlockPoolSliceScanner.java private static final String VERIFICATION_PREFIX = "dncp_block_verification.log"; You removed "private" from the interface, I wonder if it's what you intended. Seems it should stay private. 4. In DatablockScanner.java: void volumeMarkedDown(FsVolumeSpi vol) throws IOException { I wonder whether if we can change it to /** relocate verification logs for volume that's marked down ... */ void relocateVerificationLogs(FsVolumeSpi volMarkedDown) ... to make it more clear? 5. In BlockPoolSliceScanner.java, void relocateVerificationLogs(FsVolumeSpi vol) throws IOException { if (verificationLog != null) { // block of code } // no code here } If the block of code is large, it would be helpful to change it to void relocateVerificationLogs(FsVolumeSpi vol) throws IOException { if (verificationLog == null) { return; } // block of code } This helps removing one level of indentation, to make it easier to read. Thanks.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks for picking this up again, Jimmy Xiang. hdfs-4239_v5.patch did not apply cleanly to trunk for me; can you re-generate this patch?

          DataNode#checkSuperuserPrivilege: As Yongjun commented, it's kind of unfortunate that you are skipping this check when Kerberos is disabled. This will make unit testing the "permission denied" case harder than it has to be. I suggest using FSPermissionChecker. The constructor takes three arguments: the current UGI, the superuser, and supergroup, and by calling FSPermissionChecker#checkSuperuserPrivilege, you can figure out if you have superuser. I realize you were following the pattern in checkBlockLocalPathAccess, but that code is legacy now (only used in the legacy local block reader).

          VERIFICATION_PREFIX: as Yongjun commented, I think you meant to keep the "private" on here

          DataStorage.java: the only change here is a whitespace change. Seems like you meant to cut this out of the diff.

          BlockPoolSliceScanner: I don't think the file reopening belongs in the slice scanner. LogFileHandler is actually a pretty abstract interface which is unconcerned with the details of where things are in files. We get it out of FsDataSetSpi#createRollingLogs. Since the normal file rolling stuff is handled in RollingLogsImpl, that's where the re-opening of verification logs should be handled as well. It's the same kind of thing.

          Another way of thinking about it is: while it's true that currently the verification logs reside in files somewhere in directories on disk, this is an implementation detail. You can easily imagine a different implementation of FsDatasetSpi where there are no on-disk directories, or where the verification log is kept in memory.

          Also, I noticed you were trying to copy the old verification log from the failed disk in your relocateVerificationLogs function. This is not a good idea, since reading from a failed disk may hang or misbehave, causing issues with the DataNode's threads. We want to treat a failed disk as radioactive and not do any more reads or writes from there if we can help it.

          @@ -930,12 +940,15 @@ synchronized Packet waitForAckHead(long seqno) throws InterruptedException {
                */
               @Override
               public synchronized void close() {
          -      while (isRunning() && ackQueue.size() != 0) {
          -        try {
          -          wait();
          -        } catch (InterruptedException e) {
          -          running = false;
          -          Thread.currentThread().interrupt();
          +      try {
          +        while (isRunning() && ackQueue.size() != 0) {
          +          wait(runningCheckInterval);
          +        }
          +      } catch (InterruptedException e) {
          +        Thread.currentThread().interrupt();
          +      } catch (IOException ioe) {
          +        if(LOG.isDebugEnabled()) {
          +          LOG.debug(myString, ioe);
          

          Why are we catching and swallowing IOException here?

          +  @Override //FsDatasetSpi
          +  public FsVolumeImpl markDownVolume(File location
          

          I don't like the assumption that our volumes are on files here. This may not be true for all FsDataSetSpi implementations. Instead, how about changing this to take a URI instead?

          Similarly, let's change the user API (in DistributeFileSystem, etc.) to take a URI as well, up to the user level. So the user can ask us to mark down file:///data/1 or something like that. That way, when we later implement different volumes which aren't on files, we can easily refer to them.

          Also, how do you feel about disableVolume instead of markDownVolume? "markdown" just makes me think of the markup language (maybe that's just me?)

          I think we need a way of listing all the volume URIs on a particular DN, and a way of listing all the currently disabled volume URIs on a DN. Otherwise it makes life hard for sysadmins.

          Another comment: we need to notify outstanding BlockReaderLocal instances that the volume is disabled. We can do this by using the shared memory segment. This should avoid the problem that JD pointed out here:

          We tried it today, it worked fine. We did encounter an interesting problem tho, the region server on the same node continued to use that disk directly since it's configured with local reads.

          If you want to do this in a follow-up change then file a JIRA for that?

          Show
          cmccabe Colin P. McCabe added a comment - Thanks for picking this up again, Jimmy Xiang . hdfs-4239_v5.patch did not apply cleanly to trunk for me; can you re-generate this patch? DataNode#checkSuperuserPrivilege : As Yongjun commented, it's kind of unfortunate that you are skipping this check when Kerberos is disabled. This will make unit testing the "permission denied" case harder than it has to be. I suggest using FSPermissionChecker . The constructor takes three arguments: the current UGI, the superuser, and supergroup, and by calling FSPermissionChecker#checkSuperuserPrivilege , you can figure out if you have superuser. I realize you were following the pattern in checkBlockLocalPathAccess , but that code is legacy now (only used in the legacy local block reader). VERIFICATION_PREFIX : as Yongjun commented, I think you meant to keep the "private" on here DataStorage.java : the only change here is a whitespace change. Seems like you meant to cut this out of the diff. BlockPoolSliceScanner : I don't think the file reopening belongs in the slice scanner. LogFileHandler is actually a pretty abstract interface which is unconcerned with the details of where things are in files. We get it out of FsDataSetSpi#createRollingLogs . Since the normal file rolling stuff is handled in RollingLogsImpl , that's where the re-opening of verification logs should be handled as well. It's the same kind of thing. Another way of thinking about it is: while it's true that currently the verification logs reside in files somewhere in directories on disk, this is an implementation detail. You can easily imagine a different implementation of FsDatasetSpi where there are no on-disk directories, or where the verification log is kept in memory. Also, I noticed you were trying to copy the old verification log from the failed disk in your relocateVerificationLogs function. This is not a good idea, since reading from a failed disk may hang or misbehave, causing issues with the DataNode's threads. We want to treat a failed disk as radioactive and not do any more reads or writes from there if we can help it. @@ -930,12 +940,15 @@ synchronized Packet waitForAckHead( long seqno) throws InterruptedException { */ @Override public synchronized void close() { - while (isRunning() && ackQueue.size() != 0) { - try { - wait(); - } catch (InterruptedException e) { - running = false ; - Thread .currentThread().interrupt(); + try { + while (isRunning() && ackQueue.size() != 0) { + wait(runningCheckInterval); + } + } catch (InterruptedException e) { + Thread .currentThread().interrupt(); + } catch (IOException ioe) { + if (LOG.isDebugEnabled()) { + LOG.debug(myString, ioe); Why are we catching and swallowing IOException here? + @Override //FsDatasetSpi + public FsVolumeImpl markDownVolume(File location I don't like the assumption that our volumes are on files here. This may not be true for all FsDataSetSpi implementations. Instead, how about changing this to take a URI instead? Similarly, let's change the user API (in DistributeFileSystem, etc.) to take a URI as well, up to the user level. So the user can ask us to mark down file:///data/1 or something like that. That way, when we later implement different volumes which aren't on files, we can easily refer to them. Also, how do you feel about disableVolume instead of markDownVolume ? "markdown" just makes me think of the markup language (maybe that's just me?) I think we need a way of listing all the volume URIs on a particular DN, and a way of listing all the currently disabled volume URIs on a DN. Otherwise it makes life hard for sysadmins. Another comment: we need to notify outstanding BlockReaderLocal instances that the volume is disabled. We can do this by using the shared memory segment. This should avoid the problem that JD pointed out here: We tried it today, it worked fine. We did encounter an interesting problem tho, the region server on the same node continued to use that disk directly since it's configured with local reads. If you want to do this in a follow-up change then file a JIRA for that?
          Hide
          yzhangal Yongjun Zhang added a comment -

          HI Jimmy Xiang, thanks for your earlier work on this issue. I wonder if you will have time to work on this? if not, do you mind I take it over? Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - HI Jimmy Xiang , thanks for your earlier work on this issue. I wonder if you will have time to work on this? if not, do you mind I take it over? Thanks.
          Hide
          jxiang Jimmy Xiang added a comment -

          Sure. Assigned it to you.

          Show
          jxiang Jimmy Xiang added a comment - Sure. Assigned it to you.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks Jimmy.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks Jimmy.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Stack,

          This issue turned out to be a duplicate of HDFS-1362, which is resolved now.

          I'm closing this jira as duplicate. Please re-open if you think there is additional issue to be addressed there.

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Stack, This issue turned out to be a duplicate of HDFS-1362 , which is resolved now. I'm closing this jira as duplicate. Please re-open if you think there is additional issue to be addressed there. Thanks.
          Hide
          yzhangal Yongjun Zhang added a comment -

          HI Harsh J,

          My bad that I did not notice your earlier comment

          I just noticed Steve's comment referring the same - should've gone through properly before spending google cycles. I feel HDFS-1362 implemented would solve half of this - and the other half would be to make the removals automatic. Right now the checkDiskError does not eject if its slow - as long as its succeed, which would have to be done via this JIRA I think. The re-add would be possible via HDFS-1362.

          until now. So we need to use the functionality provided by HDFS-1362 to automatically remove a sick disk. It seems the original goal of HDFS-4239 is the same as HDFS-1362 (right?), and we can create a new jira for automatically removing a sick disk?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - HI Harsh J , My bad that I did not notice your earlier comment I just noticed Steve's comment referring the same - should've gone through properly before spending google cycles. I feel HDFS-1362 implemented would solve half of this - and the other half would be to make the removals automatic. Right now the checkDiskError does not eject if its slow - as long as its succeed, which would have to be done via this JIRA I think. The re-add would be possible via HDFS-1362 . until now. So we need to use the functionality provided by HDFS-1362 to automatically remove a sick disk. It seems the original goal of HDFS-4239 is the same as HDFS-1362 (right?), and we can create a new jira for automatically removing a sick disk? Thanks.

            People

            • Assignee:
              yzhangal Yongjun Zhang
              Reporter:
              stack stack
            • Votes:
              0 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development