Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: HA branch (HDFS-1623)
    • Fix Version/s: HA branch (HDFS-1623)
    • Component/s: ha, namenode
    • Labels:
      None

      Description

      Currently, the NN logs its edits to each of its edits directories in sequence. This can produce the following bad sequence:

      • NN accumulates 100 edits (tx 1-100) in the buffer. Writes and syncs to local drive, then crashes
      • Failover occurs. SBN takes over at txid=1, since txid 1 never got writen.
      • First NN restarts. It reads up to txid 100 from its local directories. It is now "ahead" of the active NN with inconsistent state.
        The solution is to write to the shared edits dir, and sync that, before writing to any local drives.

        Attachments

        1. hdfs-2874.txt
          9 kB
          Todd Lipcon
        2. hdfs-2874.txt
          24 kB
          Todd Lipcon
        3. hdfs-2874.txt
          24 kB
          Todd Lipcon

          Issue Links

            Activity

              People

              • Assignee:
                tlipcon Todd Lipcon
                Reporter:
                tlipcon Todd Lipcon
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: