Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9434

Recommission a datanode with 500k blocks may pause NN for 30 seconds

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.2, 2.6.3, 3.0.0-alpha1
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In BlockManager, processOverReplicatedBlocksOnReCommission is called within the namespace lock. There is a (not very useful) log message printed in processOverReplicatedBlock. When there is a large number of blocks stored in a storage, printing the log message for each block can pause NN to process any other operations. We did see that it could pause NN for 30 seconds for a storage with 500k blocks.

      I suggest to change the log message to trace level as a quick fix.

        Attachments

        Issue Links

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment