Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6289

HA failover can fail if there are pending DN messages for DNs which no longer exist

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.5.0
    • Component/s: ha
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      In an HA setup, the standby NN may receive messages from DNs for blocks which the standby NN is not yet aware of. It queues up these messages and replays them when it next reads from the edit log or fails over. On a failover, all of these pending DN messages must be processed successfully in order for the failover to succeed. If one of these pending DN messages refers to a DN storageId that no longer exists (because the DN with that transfer address has been reformatted and has re-registered with the same transfer address) then on transition to active the NN will not be able to process this DN message and will suicide with an error like the following:

      2014-04-25 14:23:17,922 FATAL namenode.NameNode (NameNode.java:doImmediateShutdown(1525)) - Error encountered requiring NN shutdown. Shutting down immediately.
      java.io.IOException: Cannot mark blk_1073741825_900(stored=blk_1073741825_1001) as corrupt because datanode 127.0.0.1:33324 does not exist
      

        Attachments

        1. HDFS-6289.patch
          11 kB
          Aaron T. Myers
        2. HDFS-6289.patch
          12 kB
          Aaron T. Myers

          Activity

            People

            • Assignee:
              atm Aaron T. Myers
              Reporter:
              atm Aaron T. Myers
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: