Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-641

Name-node should demand a block report from resurrected data-nodes.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.1.0, 0.7.2
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None

      Description

      1. This bug contributed to the crash discussed in HADOOP-572.
      The problem is that when the name-node is busy, and is not able to process all requests from its clients,
      it can consider one of data-nodes dead and discard its blocks sending them into the neededRelications list.
      When it finally gets the heartbeat from this data-node it resurrects the node, but not the data-node blocks,
      and hence continues to replicate them.
      Of course, eventually the name-node will receive the block report from this data-node, but it could take up
      to 1 hour. During this time it proceeds with unnecessary block replications, which could be avoided if the
      data-node sent its block report right after the resurrection.

      I modified code so that the name-node requests block report if it receives a heartbeat from a dead data-node.
      I introduced a new command type in the BlockCommand class.
      I replaced multiple boolean indicators of the command types by one enum field.
      I changed the DatanodeProtocol version.

      2. This patch also includes a fix for the data-node registration. If a data-nodes times out during registration
      it silently exits, which is hard to notice with a large number of nodes. This patch places registration in a loop,
      so that it could retry.

        Attachments

        1. ResurrectDN.patch
          12 kB
          Konstantin Shvachko

          Issue Links

            Activity

              People

              • Assignee:
                shv Konstantin Shvachko
                Reporter:
                shv Konstantin Shvachko
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: