Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9901

Move disk IO out of the heartbeat thread

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • datanode
    • None

    Description

      During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, which checks the existence and length of a block before spins off a thread to do the actual transferring. In extreme cases, the heartbeat thread hang more than 10 minutes so the namenode marked the datanode as dead and started replicating its blocks, which caused more disk IO on other nodes and can potentially brought them down.

      The patch contains two changes:
      1. Makes DF asynchronous when monitoring the disk by creating a thread that checks the disk and updates the disk status periodically. When the heartbeat threads generates storage report, it then reads disk usage information from memory so that the heartbeat thread won't get blocked during heavy diskIO.
      2. Makes the checks (which required disk accesses) in transferBlock() in DataNode into a separate thread so the heartbeat does not have to wait for this when heartbeating.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hualiu Hua Liu Assign to me
            hualiu Hua Liu

            Dates

              Created:
              Updated:

              Slack

                Issue deployment