Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7489

Incorrect locking in FsVolumeList#checkDirs can hang datanodes

    Details

    • Target Version/s:

      Description

      Starting after upgrading to 2.5.0 (CDH 5.2.1), we started to see datanodes hanging their heartbeat and requests from clients. After some digging, I identified the culprit as being the checkDiskError() triggered by catching IOExceptions (in our case, SocketExceptions being triggered on one datanode by ReplicaAlreadyExistsExceptions on another datanode).

      Thread dumps reveal that the checkDiskErrors() thread is holding a lock on the FsVolumeList:

      "Thread-409" daemon prio=10 tid=0x00007f4e50200800 nid=0x5b8e runnable [0x00007f4e2f855000]
         java.lang.Thread.State: RUNNABLE
              at java.io.UnixFileSystem.list(Native Method)
              at java.io.File.list(File.java:973)
              at java.io.File.listFiles(File.java:1051)
              at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:89)
              at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
              at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:180)
              - locked <0x000000063b182ea0> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
              at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
              at java.lang.Thread.run(Thread.java:662)
      

      Other things would then lock the FsDatasetImpl while waiting for the FsVolumeList, e.g.:

      "DataXceiver for client  at /10.10.0.52:46643 [Receiving block BP-1573746465-127.0.1.1-1352244533715:blk_1073770670_1099996962574]" daemon prio=10 tid=0x00007f4e55561000 nid=0x406d waiting for monitor entry [0x00007f4e3106d000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getNextVolume(FsVolumeList.java:64)
              - waiting to lock <0x000000063b182ea0> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:927)
              - locked <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:101)
              at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:167)
              at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
              at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
              at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
              at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
              at java.lang.Thread.run(Thread.java:662)
      

      That lock on the FsDatasetImpl then causes other threads to block:

      "Thread-127" daemon prio=10 tid=0x00007f4e4c67d800 nid=0x2e02 waiting for monitor entry [0x00007f4e33390000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:228)
              - waiting to lock <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
              at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyBlock(BlockPoolSliceScanner.java:436)
              at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyFirstBlock(BlockPoolSliceScanner.java:523)
              at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:684)
              at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:650)
              at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:101)
      
      "DataNode: [[[DISK]file:/data/0/dfs/dn/, [DISK]file:/data/1/dfs/dn/, [DISK]file:/data/2/dfs/dn/, [DISK]file:/data/3/dfs/dn/, [DISK]file:/data/4/dfs/dn/, [DISK]file:/data/5/dfs/dn/, [DISK]file:/data/6/dfs/dn/, [DISK]file:/data/7/dfs/dn/, [DISK]file:/data/8/dfs/dn/, [DISK]file:/data/9/dfs/dn/, [DISK]file:/data/10/dfs/dn/, [DISK]file:/data/11/dfs/dn/, [DISK]file:/data/12/dfs/dn/, [DISK]file:/data/13/dfs/dn/, [DISK]file:/data/14/dfs/dn/, [DISK]file:/data/15/dfs/dn/, [DISK]file:/data/16/dfs/dn/, [DISK]file:/data/17/dfs/dn/, [DISK]file:/data/18/dfs/dn/, [DISK]file:/data/19/dfs/dn/, [DISK]file:/data/20/dfs/dn/, [DISK]file:/data/21/dfs/dn/, [DISK]file:/data/22/dfs/dn/, [DISK]file:/data/23/dfs/dn/, [DISK]file:/data/24/dfs/dn/, [DISK]file:/data/25/dfs/dn/, [DISK]file:/data/26/dfs/dn/, [DISK]file:/data/27/dfs/dn/, [DISK]file:/data/28/dfs/dn/, [DISK]file:/data/29/dfs/dn/, [DISK]file:/data/30/dfs/dn/, [DISK]file:/data/31/dfs/dn/, [DISK]file:/data/32/dfs/dn/, [DISK]file:/data/33/dfs/dn/, [DISK]file:/data/34/dfs/dn/, [DISK]file:/data/35/dfs/dn/]]  heartbeating to bigdata-01.sc-chi-int.37signals.com/10.10.0.211:8020" daemon prio=10 tid=0x00007f4e553a5800 nid=0x2d66 waiting for monitor entry [0x00007f4e361be000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:116)
              - waiting to lock <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:132)
              - locked <0x000000063b182d80> (a java.lang.Object)
              at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:572)
              at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
              at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:847)
              at java.lang.Thread.run(Thread.java:662)
      
      "DataXceiver for client DFSClient_NONMAPREDUCE_-1948416574_103 at /10.10.0.169:44229 [Receiving block BP-1573746465-127.0.1.1-1352244533715:blk_1073776794_1099996963072]" daemon prio=10 tid=0x00007f4e55431000 nid=0x4ab8 waiting for monitor entry [0x00007f4e2a448000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:782)
              - waiting to lock <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:101)
              at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:171)
              at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
              at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
              at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
              at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
              at java.lang.Thread.run(Thread.java:662)
      
      

      This occurs, I believe, because FsVolumeList#checkDirs() locks the entire FsVolumeList to prevent multiple simultaneous checkDirs() calls. On a slow system under high IO or with many disks, the checkDirs() call can take dozens of seconds or longer (empirically).

      I think this can be improved by holding a separate mutex limited to checkDirs (must like datanode.checkDiskErrors() does) and only locking the full FsVolumeList if needed to update the volume list if any disks are removed. I'll attach a patch that does this.

      We're running this patch in production and it's working as expected – a lock is held locally instead of on the entire FsVolumeList while checkDirs() is running:

      "Thread-614" daemon prio=10 tid=0x000000004037b000 nid=0x7331 runnable [0x00007f4d45391000]
         java.lang.Thread.State: RUNNABLE
              at java.io.UnixFileSystem.createDirectory(Native Method)
              at java.io.File.mkdir(File.java:1157)
              at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67)
              at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:104)
              at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:88)
              at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
              at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:182)
              - locked <0x000000063b24f540> (a java.lang.Object)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
              at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
              at java.lang.Thread.run(Thread.java:662)
      

      Feedback would be appreciated!

      1. HDFS-7489-v1.patch
        4 kB
        Noah Lorang
      2. HDFS-7489-v2.patch
        3 kB
        Noah Lorang
      3. HDFS-7489-v2.patch.1
        3 kB
        Colin P. McCabe

        Issue Links

          Activity

          Hide
          andrew.wang Andrew Wang added a comment -

          FYI for git log greppers, this commit is missing the JIRA ID.

          Show
          andrew.wang Andrew Wang added a comment - FYI for git log greppers, this commit is missing the JIRA ID.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks for the patch, Noah Lorang.

          Show
          cmccabe Colin P. McCabe added a comment - Thanks for the patch, Noah Lorang .
          Hide
          nlorang Noah Lorang added a comment -

          My pleasure. Thanks for getting this upstream so quickly.

          Show
          nlorang Noah Lorang added a comment - My pleasure. Thanks for getting this upstream so quickly.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Ah. The findbugs version was bumped recently in HADOOP-10476. My uninformed guess is that we are using the old findbugs for "previous findbugs warnings" and the new one for patch findbugs, which of course makes every patch look bad?

          Show
          cmccabe Colin P. McCabe added a comment - Ah. The findbugs version was bumped recently in HADOOP-10476 . My uninformed guess is that we are using the old findbugs for "previous findbugs warnings" and the new one for patch findbugs, which of course makes every patch look bad?
          Hide
          cmccabe Colin P. McCabe added a comment -

          Test failures are bogus. I ran TestOfflineEditsViewer, TestRbwSpaceReservation,TestRenameWithSnapshots etc and got no failures. The findbugs results are very puzzling... a bunch of "null pointer dereference" warnings for System.out and System.err (and not in code modified by this patch). I think we can safely assume that System.out and System.err are not null-- no idea what went wrong with findbugs here. Perhaps this is another JDK version upgrade issues.

          +1. Will commit in a few minutes. Thanks, Noah.

          Show
          cmccabe Colin P. McCabe added a comment - Test failures are bogus. I ran TestOfflineEditsViewer, TestRbwSpaceReservation,TestRenameWithSnapshots etc and got no failures. The findbugs results are very puzzling... a bunch of "null pointer dereference" warnings for System.out and System.err (and not in code modified by this patch). I think we can safely assume that System.out and System.err are not null-- no idea what went wrong with findbugs here. Perhaps this is another JDK version upgrade issues. +1. Will commit in a few minutes. Thanks, Noah.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685901/HDFS-7489-v2.patch.1
          against trunk revision ddffcd8.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 287 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots

          The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.tools.offlineEditsVieweTests
          org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestRbwSTests
          org.apache.hadoop.hdfs.server.datanode.fsdataset.TestTests

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8963//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8963//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8963//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685901/HDFS-7489-v2.patch.1 against trunk revision ddffcd8. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 287 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.tools.offlineEditsVieweTests org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestRbwSTests org.apache.hadoop.hdfs.server.datanode.fsdataset.TestTests +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8963//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8963//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8963//console This message is automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          It looks like HADOOP-10530 broke the HDFS precommit job. Hopefully it should be fixed now...

          Show
          cmccabe Colin P. McCabe added a comment - It looks like HADOOP-10530 broke the HDFS precommit job. Hopefully it should be fixed now...
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685885/HDFS-7489-v2.patch-with-minjdk6-2
          against trunk revision ddffcd8.

          -1 patch. Trunk compilation may be broken.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8959//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685885/HDFS-7489-v2.patch-with-minjdk6-2 against trunk revision ddffcd8. -1 patch . Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8959//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685861/HDFS-7489-v2.patch-with-minjdk6
          against trunk revision ddffcd8.

          -1 patch. Trunk compilation may be broken.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8958//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685861/HDFS-7489-v2.patch-with-minjdk6 against trunk revision ddffcd8. -1 patch . Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8958//console This message is automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          It seems like we are still hitting executors with jdk6 installed. I have attached a version of the patch that sets minJdk=6, hopefully this will allow us to get a Jenkins run.

          In view of how critical this patch is, I'd like to get it in today if possible.

          thanks guys

          Show
          cmccabe Colin P. McCabe added a comment - It seems like we are still hitting executors with jdk6 installed. I have attached a version of the patch that sets minJdk=6, hopefully this will allow us to get a Jenkins run. In view of how critical this patch is, I'd like to get it in today if possible. thanks guys
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685850/HDFS-7489-v2.patch
          against trunk revision ddffcd8.

          -1 patch. Trunk compilation may be broken.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8956//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685850/HDFS-7489-v2.patch against trunk revision ddffcd8. -1 patch . Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8956//console This message is automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Noah, this fix solves the current (major) problem. I agree that calling removeVolume from checkDirs should avoid concurrent modification issues. I still would prefer to see some refactoring, but it's critical to fix this due to its severity... so we should do that in a follow-on. So I am +1 on the latest version, pending Jenkins.

          The error message from the jenkins run is:

          Detected JDK Version: 1.6.0-45 is not in the allowed range [1.7,).
          

          This is related to the JDK6->JDK7 conversion of executors discussed upstream, not to the patch. I will re-trigger Jenkins.

          I will file follow-up issues for two remaining problems:

          • If multiple threads call checkDirs at the same time, we should only do checkDirs once and give the results to all threads. This might be easiest to accomplish with a Guava cache.
          • We probably need to start reference-counting volumes so that they aren't removed while some code has a reference to them.
          Show
          cmccabe Colin P. McCabe added a comment - Noah, this fix solves the current (major) problem. I agree that calling removeVolume from checkDirs should avoid concurrent modification issues. I still would prefer to see some refactoring, but it's critical to fix this due to its severity... so we should do that in a follow-on. So I am +1 on the latest version, pending Jenkins. The error message from the jenkins run is: Detected JDK Version: 1.6.0-45 is not in the allowed range [1.7,). This is related to the JDK6->JDK7 conversion of executors discussed upstream, not to the patch. I will re-trigger Jenkins. I will file follow-up issues for two remaining problems: If multiple threads call checkDirs at the same time, we should only do checkDirs once and give the results to all threads. This might be easiest to accomplish with a Guava cache. We probably need to start reference-counting volumes so that they aren't removed while some code has a reference to them.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685850/HDFS-7489-v2.patch
          against trunk revision 6c5bbd7.

          -1 patch. Trunk compilation may be broken.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8955//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685850/HDFS-7489-v2.patch against trunk revision 6c5bbd7. -1 patch . Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8955//console This message is automatically generated.
          Hide
          nlorang Noah Lorang added a comment -

          Thanks Lei (Eddy) Xu and Colin P. McCabe.

          If I'm not mistaken, using removeVolume() should resolve the conflict of a volume being added or removed while checkDirs() is running – that will remove each volume one-at-a-time while locking the FsVolumeList. That doesn't do anything for the actual checking of already-removed volumes (they'll still be checked) and it won't catch newly-added volumes in that run of checkDirs(). I'll attach a reworked patch that does this.

          I've left the checkDirsMutex to maintain the prior behavior of limiting one run of checkDirs at a time. This might be unnecessary – as far as I can tell, the only pathway by which it's used is from the Datanode#checkDiskError, which is only used via a one-at-a-time thread.

          Show
          nlorang Noah Lorang added a comment - Thanks Lei (Eddy) Xu and Colin P. McCabe . If I'm not mistaken, using removeVolume() should resolve the conflict of a volume being added or removed while checkDirs() is running – that will remove each volume one-at-a-time while locking the FsVolumeList . That doesn't do anything for the actual checking of already-removed volumes (they'll still be checked) and it won't catch newly-added volumes in that run of checkDirs() . I'll attach a reworked patch that does this. I've left the checkDirsMutex to maintain the prior behavior of limiting one run of checkDirs at a time. This might be unnecessary – as far as I can tell, the only pathway by which it's used is from the Datanode#checkDiskError , which is only used via a one-at-a-time thread.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks for looking at this, Noah Lorang. I think this is a very important fix to make.

          As Lei (Eddy) Xu commented, I don't think adding checkDirsMutex is going to work here. The volumeList can be changed by things other than checkDirs... such as the DataNode hard drive hot swap feature.

          I think that what we ought to do is create an AtomicReference<FsVolumeImpl[]>. Then, FsVolumeList#checkDirs can get a copy of the current array of volumes at the start locklessly, and simply call checkDirs on all of them. This will fix the current problem. I don't think there is a big disadvantage to calling checkDirs on removed volumes-- it's extra work, but shouldn't cause correctness issues.

          I think there are still some correctness issues here we'll need to work, through. For example, right now we could be calling FsVolumeList#getNextVolume right before removing a volume, and then trying to use it right after. This is problem even right now... the FsVolumeList lock protects you, but only during the getNextVolume call. After the call ends, the volume can be removed at any time. We should file a follow-up JIRA for that, though. It is critical to fix this performance issue now, and the race is an existing issue.

          Show
          cmccabe Colin P. McCabe added a comment - Thanks for looking at this, Noah Lorang . I think this is a very important fix to make. As Lei (Eddy) Xu commented, I don't think adding checkDirsMutex is going to work here. The volumeList can be changed by things other than checkDirs ... such as the DataNode hard drive hot swap feature. I think that what we ought to do is create an AtomicReference<FsVolumeImpl[]> . Then, FsVolumeList#checkDirs can get a copy of the current array of volumes at the start locklessly, and simply call checkDirs on all of them. This will fix the current problem. I don't think there is a big disadvantage to calling checkDirs on removed volumes-- it's extra work, but shouldn't cause correctness issues. I think there are still some correctness issues here we'll need to work, through. For example, right now we could be calling FsVolumeList#getNextVolume right before removing a volume, and then trying to use it right after. This is problem even right now... the FsVolumeList lock protects you, but only during the getNextVolume call. After the call ends, the volume can be removed at any time. We should file a follow-up JIRA for that, though. It is critical to fix this performance issue now, and the race is an existing issue.
          Hide
          eddyxu Lei (Eddy) Xu added a comment -

          Hi, Noah Lorang. This is a very good observation and the patch looks good in general.

          We had introduced DataNode hot swap feature (https://issues.apache.org/jira/browse/HDFS-1362). As it allows data volumes being added / removed during the run time, would it introduce some conflicts to the following code?

           synchronized(this) {
              // Replace volume list
             volumes = Collections.unmodifiableList(volumeList);
             FsDatasetImpl.LOG.warn("Completed checkDirs. Removed " + removedVols.size()
            + " volumes. Current volumes: " + this);
            }	
          

          Also it would be great to address a race condition that a FsVolumeImpl being checked is being removed and re-added back. Moreover, could you re-use FsVolumeList#removeVolume ?

          I would give a non-binding +1 after the above comments being addressed. Thanks again for working on this, Noah Lorang!

          Show
          eddyxu Lei (Eddy) Xu added a comment - Hi, Noah Lorang . This is a very good observation and the patch looks good in general. We had introduced DataNode hot swap feature ( https://issues.apache.org/jira/browse/HDFS-1362 ). As it allows data volumes being added / removed during the run time, would it introduce some conflicts to the following code? synchronized ( this ) { // Replace volume list volumes = Collections.unmodifiableList(volumeList); FsDatasetImpl.LOG.warn( "Completed checkDirs. Removed " + removedVols.size() + " volumes. Current volumes: " + this ); } Also it would be great to address a race condition that a FsVolumeImpl being checked is being removed and re-added back. Moreover, could you re-use FsVolumeList#removeVolume ? I would give a non-binding +1 after the above comments being addressed. Thanks again for working on this, Noah Lorang !
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685778/HDFS-7489-v1.patch
          against trunk revision ffe942b.

          -1 patch. Trunk compilation may be broken.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8952//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685778/HDFS-7489-v1.patch against trunk revision ffe942b. -1 patch . Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8952//console This message is automatically generated.

            People

            • Assignee:
              nlorang Noah Lorang
              Reporter:
              nlorang Noah Lorang
            • Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development