Details

    • Type: Sub-task Sub-task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: QuorumJournalManager (HDFS-3077)
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Currently, the JournalNodes automatically format themselves when a new writer takes over, if they don't have any data for that namespace. However, this has a few problems:
      1) if the administrator accidentally points a new NN at the wrong quorum (eg corresponding to another cluster), it will auto-format a directory on those nodes. This doesn't cause any data loss, but would be better to bail out with an error indicating that they need to be formatted.
      2) if a journal node crashes and needs to be reformatted, it should be able to re-join the cluster and start storing new segments without having to fail over to a new NN.
      3) if 2/3 JNs get accidentally reformatted (eg the mount point becomes undone), and the user starts the NN, it should fail to start, because it may end up missing edits. If it auto-formats in this case, the user might have silent "rollback" of the most recent edits.

        Activity

        Todd Lipcon created issue -
        Hide
        Todd Lipcon added a comment -

        I'll propose the following design:

        • the "hdfs namenode -format" command should take confirmation and then format all of the underlying JNs, regardless of whether they currently have data.
        • at startup, and at the beginning of each log segment, if a quorum of JNs are formatted, but a minority are not, the NN should auto-format the minority. This allows an admin to replace a dead JN without taking any downtime or running any "format" command. He or she simply re-starts the dead node with a fresh disk, or reassigns a VIP/CNAME to some new node.
        • at startup, if the majority of JNs are unformatted, the NN should refuse to start up, because it may result in missing edits. This would require manual intervention, for now, if the admin really wants to start up despite the potential data loss (eg rsyncing one JN's directories to one of the fresh nodes). A future enhancement would be to automate this "unsafe startup" process.
        • for the HA use case, the "initializeSharedEdits" function would take care of formatting the JNs.

        The above proposed behavior is based on what we currently do with storage directories: if one is formatted and another is not, we will auto-format the empty one. If none are formatted, we require an explicit format step.

        Show
        Todd Lipcon added a comment - I'll propose the following design: the "hdfs namenode -format" command should take confirmation and then format all of the underlying JNs, regardless of whether they currently have data. at startup, and at the beginning of each log segment, if a quorum of JNs are formatted, but a minority are not, the NN should auto-format the minority. This allows an admin to replace a dead JN without taking any downtime or running any "format" command. He or she simply re-starts the dead node with a fresh disk, or reassigns a VIP/CNAME to some new node. at startup, if the majority of JNs are unformatted, the NN should refuse to start up, because it may result in missing edits. This would require manual intervention, for now, if the admin really wants to start up despite the potential data loss (eg rsyncing one JN's directories to one of the fresh nodes). A future enhancement would be to automate this "unsafe startup" process. for the HA use case, the "initializeSharedEdits" function would take care of formatting the JNs. The above proposed behavior is based on what we currently do with storage directories: if one is formatted and another is not, we will auto-format the empty one. If none are formatted, we require an explicit format step.
        Hide
        Aaron T. Myers added a comment -

        +1, this design makes sense to me.

        Show
        Aaron T. Myers added a comment - +1, this design makes sense to me.
        Hide
        Andrew Purtell added a comment -

        Not sure about the notion of automating an "unsafe startup" in the case the majority of JNs are unformatted. What if instead, it's possible to start up the NN in recovery mode and have it interactively suggest actions including initializing the unformatted JNs? Could summarize the most recent txn (or a few txns) of the available logs before asking which txid to choose as latest?

        Show
        Andrew Purtell added a comment - Not sure about the notion of automating an "unsafe startup" in the case the majority of JNs are unformatted. What if instead, it's possible to start up the NN in recovery mode and have it interactively suggest actions including initializing the unformatted JNs? Could summarize the most recent txn (or a few txns) of the available logs before asking which txid to choose as latest?

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:

              Development