Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-150

Replication should be decoupled from heartbeat

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None
    • Hadoop 80 node cluster

    Description

      I did a simple experiment of shooting down one node in the cluster and measure the time taken to replicate the under-replicated blocks.

      ~30000 blocks were under replicated == ~400 / node should take 200 minutes to replicate completely given 1 minute heartbeat interval.
      My findings: it took around 220 minutes, which is reasonable.

      Bug: Replication is coupled with heartbeat. Heartbeat interval is based on how much a namenode can handle. Repliaction should be based on how much a datanode can handle.

      So given the default heartbeat interval of 20 seconds, we computed datanodes can handle 2 replications in that interval based on which Namenodes give 2 blocks per heartbeat to replicate.

      What we propose is to keep the 20second/2blocks constant and hence a datanode coming in with a heartbeat of 1 minute interval should be given 6 blocks to replicate per heartbeat. In this case instead on taking 200 minutes it should take 200/3 ~1 hour to replicate the entire node.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              srikantk Srikanth Kakani
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: