Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: HA branch (HDFS-1623)
    • Fix Version/s: HA branch (HDFS-1623)
    • Component/s: ha, namenode
    • Labels:
      None

      Description

      Currently, the NN logs its edits to each of its edits directories in sequence. This can produce the following bad sequence:

      • NN accumulates 100 edits (tx 1-100) in the buffer. Writes and syncs to local drive, then crashes
      • Failover occurs. SBN takes over at txid=1, since txid 1 never got writen.
      • First NN restarts. It reads up to txid 100 from its local directories. It is now "ahead" of the active NN with inconsistent state.
        The solution is to write to the shared edits dir, and sync that, before writing to any local drives.
      1. hdfs-2874.txt
        24 kB
        Todd Lipcon
      2. hdfs-2874.txt
        24 kB
        Todd Lipcon
      3. hdfs-2874.txt
        9 kB
        Todd Lipcon

        Issue Links

          Activity

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development