Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-641

Name-node should demand a block report from resurrected data-nodes.

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.1.0, 0.7.2
    • 0.8.0
    • None
    • None

    Description

      1. This bug contributed to the crash discussed in HADOOP-572.
      The problem is that when the name-node is busy, and is not able to process all requests from its clients,
      it can consider one of data-nodes dead and discard its blocks sending them into the neededRelications list.
      When it finally gets the heartbeat from this data-node it resurrects the node, but not the data-node blocks,
      and hence continues to replicate them.
      Of course, eventually the name-node will receive the block report from this data-node, but it could take up
      to 1 hour. During this time it proceeds with unnecessary block replications, which could be avoided if the
      data-node sent its block report right after the resurrection.

      I modified code so that the name-node requests block report if it receives a heartbeat from a dead data-node.
      I introduced a new command type in the BlockCommand class.
      I replaced multiple boolean indicators of the command types by one enum field.
      I changed the DatanodeProtocol version.

      2. This patch also includes a fix for the data-node registration. If a data-nodes times out during registration
      it silently exits, which is hard to notice with a large number of nodes. This patch places registration in a loop,
      so that it could retry.

      Attachments

        1. ResurrectDN.patch
          12 kB
          Konstantin Shvachko

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            shv Konstantin Shvachko
            shv Konstantin Shvachko
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment