Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9434

Recommission a datanode with 500k blocks may pause NN for 30 seconds

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.8.0, 2.7.2, 2.6.3, 3.0.0-alpha1
    • namenode
    • None
    • Reviewed

    Description

      In BlockManager, processOverReplicatedBlocksOnReCommission is called within the namespace lock. There is a (not very useful) log message printed in processOverReplicatedBlock. When there is a large number of blocks stored in a storage, printing the log message for each block can pause NN to process any other operations. We did see that it could pause NN for 30 seconds for a storage with 500k blocks.

      I suggest to change the log message to trace level as a quick fix.

      Attachments

        1. h9434_20151116.patch
          1 kB
          Tsz-wo Sze
        2. h9434_20151116_branch-2.6.patch
          1 kB
          Tsz-wo Sze

        Issue Links

          Activity

            People

              szetszwo Tsz-wo Sze
              szetszwo Tsz-wo Sze
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: