Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9668

Optimize the locking in FsDatasetImpl

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: datanode
    • Labels:
      None

      Description

      During the HBase test on a tiered storage of HDFS (WAL is stored in SSD/RAMDISK, and all other files are stored in HDD), we observe many long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part of the jstack result:

      "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at /192.168.50.16:48521 [Receiving block BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread t@93336
         java.lang.Thread.State: BLOCKED
      	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1111)
      	- waiting to lock <18324c9> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at /192.168.50.16:48520 [Receiving block BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
      	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
      	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183)
      	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
      	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
      	at java.lang.Thread.run(Thread.java:745)
      
         Locked ownable synchronizers:
      	- None
      	
      "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at /192.168.50.16:48520 [Receiving block BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread t@93335
         java.lang.Thread.State: RUNNABLE
      	at java.io.UnixFileSystem.createFileExclusively(Native Method)
      	at java.io.File.createNewFile(File.java:1012)
      	at org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
      	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
      	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
      	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
      	- locked <18324c9> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
      	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
      	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183)
      	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
      	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
      	at java.lang.Thread.run(Thread.java:745)
      
         Locked ownable synchronizers:
      	- None
      

      We measured the execution of some operations in FsDatasetImpl during the test. Here following is the result.

      The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy load take a really long time.
      It means one slow operation of finalizeBlock, addBlock and createRbw in a slow storage can block all the other same operations in the same DataNode, especially in HBase when many wal/flusher/compactor are configured.
      We need a finer grained lock mechanism in a new FsDatasetImpl implementation and users can choose the implementation by configuring "dfs.datanode.fsdataset.factory" in DataNode.
      We can implement the lock by either storage level or block-level.

        Attachments

        1. HDFS-9668-9.patch
          106 kB
          Jingcheng Du
        2. HDFS-9668-8.patch
          106 kB
          Jingcheng Du
        3. HDFS-9668-7.patch
          107 kB
          Jingcheng Du
        4. HDFS-9668-6.patch
          90 kB
          Jingcheng Du
        5. HDFS-9668-5.patch
          90 kB
          Jingcheng Du
        6. HDFS-9668-4.patch
          90 kB
          Jingcheng Du
        7. HDFS-9668-3.patch
          10 kB
          Jingcheng Du
        8. HDFS-9668-26.patch
          55 kB
          Jingcheng Du
        9. HDFS-9668-25.patch
          51 kB
          Jingcheng Du
        10. HDFS-9668-24.patch
          51 kB
          Jingcheng Du
        11. HDFS-9668-23.patch
          114 kB
          Jingcheng Du
        12. HDFS-9668-23.patch
          114 kB
          Jingcheng Du
        13. HDFS-9668-22.patch
          114 kB
          Jingcheng Du
        14. HDFS-9668-21.patch
          113 kB
          Jingcheng Du
        15. HDFS-9668-20.patch
          107 kB
          Jingcheng Du
        16. HDFS-9668-2.patch
          86 kB
          Jingcheng Du
        17. HDFS-9668-19.patch
          107 kB
          Jingcheng Du
        18. HDFS-9668-19.patch
          107 kB
          Jingcheng Du
        19. HDFS-9668-18.patch
          107 kB
          Jingcheng Du
        20. HDFS-9668-17.patch
          124 kB
          Jingcheng Du
        21. HDFS-9668-16.patch
          123 kB
          Jingcheng Du
        22. HDFS-9668-15.patch
          121 kB
          Jingcheng Du
        23. HDFS-9668-14.patch
          121 kB
          Jingcheng Du
        24. HDFS-9668-14.patch
          121 kB
          Jingcheng Du
        25. HDFS-9668-13.patch
          120 kB
          Jingcheng Du
        26. HDFS-9668-12.patch
          106 kB
          Jingcheng Du
        27. HDFS-9668-11.patch
          106 kB
          Jingcheng Du
        28. HDFS-9668-10.patch
          106 kB
          Jingcheng Du
        29. HDFS-9668-1.patch
          64 kB
          Jingcheng Du
        30. execution_time.png
          15 kB
          Jingcheng Du

        Issue Links

          Activity

            People

            • Assignee:
              jingcheng.du@intel.com Jingcheng Du
              Reporter:
              jingcheng.du@intel.com Jingcheng Du

              Dates

              • Created:
                Updated:

                Issue deployment