Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7489

Incorrect locking in FsVolumeList#checkDirs can hang datanodes

    XMLWordPrintableJSON

Details

    Description

      Starting after upgrading to 2.5.0 (CDH 5.2.1), we started to see datanodes hanging their heartbeat and requests from clients. After some digging, I identified the culprit as being the checkDiskError() triggered by catching IOExceptions (in our case, SocketExceptions being triggered on one datanode by ReplicaAlreadyExistsExceptions on another datanode).

      Thread dumps reveal that the checkDiskErrors() thread is holding a lock on the FsVolumeList:

      "Thread-409" daemon prio=10 tid=0x00007f4e50200800 nid=0x5b8e runnable [0x00007f4e2f855000]
         java.lang.Thread.State: RUNNABLE
              at java.io.UnixFileSystem.list(Native Method)
              at java.io.File.list(File.java:973)
              at java.io.File.listFiles(File.java:1051)
              at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:89)
              at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
              at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:180)
              - locked <0x000000063b182ea0> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
              at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
              at java.lang.Thread.run(Thread.java:662)
      

      Other things would then lock the FsDatasetImpl while waiting for the FsVolumeList, e.g.:

      "DataXceiver for client  at /10.10.0.52:46643 [Receiving block BP-1573746465-127.0.1.1-1352244533715:blk_1073770670_1099996962574]" daemon prio=10 tid=0x00007f4e55561000 nid=0x406d waiting for monitor entry [0x00007f4e3106d000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getNextVolume(FsVolumeList.java:64)
              - waiting to lock <0x000000063b182ea0> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:927)
              - locked <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:101)
              at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:167)
              at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
              at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
              at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
              at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
              at java.lang.Thread.run(Thread.java:662)
      

      That lock on the FsDatasetImpl then causes other threads to block:

      "Thread-127" daemon prio=10 tid=0x00007f4e4c67d800 nid=0x2e02 waiting for monitor entry [0x00007f4e33390000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:228)
              - waiting to lock <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
              at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyBlock(BlockPoolSliceScanner.java:436)
              at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyFirstBlock(BlockPoolSliceScanner.java:523)
              at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:684)
              at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:650)
              at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:101)
      
      "DataNode: [[[DISK]file:/data/0/dfs/dn/, [DISK]file:/data/1/dfs/dn/, [DISK]file:/data/2/dfs/dn/, [DISK]file:/data/3/dfs/dn/, [DISK]file:/data/4/dfs/dn/, [DISK]file:/data/5/dfs/dn/, [DISK]file:/data/6/dfs/dn/, [DISK]file:/data/7/dfs/dn/, [DISK]file:/data/8/dfs/dn/, [DISK]file:/data/9/dfs/dn/, [DISK]file:/data/10/dfs/dn/, [DISK]file:/data/11/dfs/dn/, [DISK]file:/data/12/dfs/dn/, [DISK]file:/data/13/dfs/dn/, [DISK]file:/data/14/dfs/dn/, [DISK]file:/data/15/dfs/dn/, [DISK]file:/data/16/dfs/dn/, [DISK]file:/data/17/dfs/dn/, [DISK]file:/data/18/dfs/dn/, [DISK]file:/data/19/dfs/dn/, [DISK]file:/data/20/dfs/dn/, [DISK]file:/data/21/dfs/dn/, [DISK]file:/data/22/dfs/dn/, [DISK]file:/data/23/dfs/dn/, [DISK]file:/data/24/dfs/dn/, [DISK]file:/data/25/dfs/dn/, [DISK]file:/data/26/dfs/dn/, [DISK]file:/data/27/dfs/dn/, [DISK]file:/data/28/dfs/dn/, [DISK]file:/data/29/dfs/dn/, [DISK]file:/data/30/dfs/dn/, [DISK]file:/data/31/dfs/dn/, [DISK]file:/data/32/dfs/dn/, [DISK]file:/data/33/dfs/dn/, [DISK]file:/data/34/dfs/dn/, [DISK]file:/data/35/dfs/dn/]]  heartbeating to bigdata-01.sc-chi-int.37signals.com/10.10.0.211:8020" daemon prio=10 tid=0x00007f4e553a5800 nid=0x2d66 waiting for monitor entry [0x00007f4e361be000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:116)
              - waiting to lock <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:132)
              - locked <0x000000063b182d80> (a java.lang.Object)
              at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:572)
              at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
              at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:847)
              at java.lang.Thread.run(Thread.java:662)
      
      "DataXceiver for client DFSClient_NONMAPREDUCE_-1948416574_103 at /10.10.0.169:44229 [Receiving block BP-1573746465-127.0.1.1-1352244533715:blk_1073776794_1099996963072]" daemon prio=10 tid=0x00007f4e55431000 nid=0x4ab8 waiting for monitor entry [0x00007f4e2a448000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:782)
              - waiting to lock <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:101)
              at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:171)
              at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
              at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
              at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
              at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
              at java.lang.Thread.run(Thread.java:662)
      
      

      This occurs, I believe, because FsVolumeList#checkDirs() locks the entire FsVolumeList to prevent multiple simultaneous checkDirs() calls. On a slow system under high IO or with many disks, the checkDirs() call can take dozens of seconds or longer (empirically).

      I think this can be improved by holding a separate mutex limited to checkDirs (must like datanode.checkDiskErrors() does) and only locking the full FsVolumeList if needed to update the volume list if any disks are removed. I'll attach a patch that does this.

      We're running this patch in production and it's working as expected – a lock is held locally instead of on the entire FsVolumeList while checkDirs() is running:

      "Thread-614" daemon prio=10 tid=0x000000004037b000 nid=0x7331 runnable [0x00007f4d45391000]
         java.lang.Thread.State: RUNNABLE
              at java.io.UnixFileSystem.createDirectory(Native Method)
              at java.io.File.mkdir(File.java:1157)
              at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67)
              at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:104)
              at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:88)
              at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
              at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:182)
              - locked <0x000000063b24f540> (a java.lang.Object)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
              at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
              at java.lang.Thread.run(Thread.java:662)
      

      Feedback would be appreciated!

      Attachments

        1. HDFS-7489-v1.patch
          4 kB
          Noah Lorang
        2. HDFS-7489-v2.patch
          3 kB
          Noah Lorang
        3. HDFS-7489-v2.patch.1
          3 kB
          Colin McCabe

        Issue Links

          Activity

            People

              nlorang Noah Lorang
              nlorang Noah Lorang
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: