Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1623 High Availability Framework for HDFS NN
  3. HDFS-2795

HA: Standby NN takes a long time to recover from a dead DN starting up

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: HA branch (HDFS-1623)
    • Fix Version/s: HA branch (HDFS-1623)
    • Component/s: datanode, ha, namenode
    • Labels:
      None

      Description

      To reproduce:

      1. Start an HA cluster with a DN.
      2. Write several blocks to the FS with replication 1.
      3. Shutdown the DN
      4. Wait for the NNs to declare the DN dead. All blocks will be under-replicated.
      5. Restart the DN.

      Note that upon restarting the DN, the active NN will immediately get all block locations from the initial BR. The standby NN will not, and instead will slowly add block locations for a subset of the previously-missing blocks on every DN heartbeat.

        Attachments

        1. hdfs-2795.txt
          10 kB
          Todd Lipcon

          Activity

            People

            • Assignee:
              tlipcon Todd Lipcon
              Reporter:
              atm Aaron Myers
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: