Hadoop Common
  1. Hadoop Common
  2. HADOOP-178

piggyback block work requests to heartbeats and move block replication/deletion startup delay from datanodes to namenode

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently each datanode sends at least two messages to namenode within a heartbeat interval. One is a heartbeat message and another is block work request. By piggybacking the block work request to a heartbeat can greatly cut the number of messages between a datanode and the namenode.

      Secondly each datanode waits for a configurable "StartupPeriod" before it sends a block work request in order to avoid uneccessary block replication at startup time. But if the namenode starts much later than datanodes, this scheme does not work. Furthermore, the namenode has more information to decide when to send block work to datanodes. For example, all datanodes send block reports etc. It is more resonable to move the startup delay from datanodes to the namenode

      1. startupDelay.patch
        15 kB
        Hairong Kuang

        Activity

        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Hairong!

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Hairong!
        Hide
        Hairong Kuang added a comment -

        I made the changes described in the issue report. In addition, I made the granularity of locking on receivedBlockList to be smaller, i.e. the code synchronizes on receivedBlockList only when reading/writing to the list. Also there seemed to be a bug on line 174 in the patch when calculating waittime. So I changed "now" to be System.currentTimeMillis().

        Show
        Hairong Kuang added a comment - I made the changes described in the issue report. In addition, I made the granularity of locking on receivedBlockList to be smaller, i.e. the code synchronizes on receivedBlockList only when reading/writing to the list. Also there seemed to be a bug on line 174 in the patch when calculating waittime. So I changed "now" to be System.currentTimeMillis().

          People

          • Assignee:
            Hairong Kuang
            Reporter:
            Hairong Kuang
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development