Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14657

Refine NameSystem lock usage during processing FBR

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      The disk with 12TB capacity is very normal today, which means the FBR size is much larger than before, Namenode holds the NameSystemLock during processing block report for each storage, which might take quite a long time.

      On our production environment, processing large FBR usually cause a longer RPC queue time, which impacts client latency, so we did some simple work on refining the lock usage, which improved the p99 latency significantly.

      In our solution, BlockManager release the NameSystem write lock and request it again for every 5000 blocks(by default) during processing FBR, with the fair lock, all the RPC request can be processed before BlockManager re-acquire the write lock.

      Attachments

        1. HDFS-14657.002.patch
          12 kB
          Chen Zhang
        2. HDFS-14657-001.patch
          9 kB
          Chen Zhang

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            zhangchen Chen Zhang
            zhangchen Chen Zhang

            Dates

              Created:
              Updated:

              Slack

                Issue deployment