Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: QuorumJournalManager (HDFS-3077)
    • Component/s: ha
    • Labels:
      None

      Description

      One of the cases not yet handled in the QJM branch is the one where either the writer or the journal node crashes after startLogSegment() but before it has written its first transaction to the log. We currently have TODO assertions in the code which fire in these cases.

      This JIRA is to deal with these cases.

      1. hdfs-3799.txt
        22 kB
        Todd Lipcon

        Activity

        Todd Lipcon created issue -
        Hide
        Todd Lipcon added a comment -

        The solution is as follows:

        • during recovery, when we validate a log, if the log has no transactions, then we remove the file (same as if the log segment was never started)
        • when coordinating recovery, if none of the loggers have any non-empty logs, then we don't have to take any action. We can simply treat the recovery as a no-op.
        Show
        Todd Lipcon added a comment - The solution is as follows: during recovery, when we validate a log, if the log has no transactions, then we remove the file (same as if the log segment was never started) when coordinating recovery, if none of the loggers have any non-empty logs, then we don't have to take any action. We can simply treat the recovery as a no-op.
        Todd Lipcon made changes -
        Field Original Value New Value
        Attachment hdfs-3799.txt [ 12540825 ]
        Hide
        Aaron T. Myers added a comment -

        Patch looks really good. The tests in particular are very solid. Two nits:

        1. sp: "Synchronziing"
        2. Recommend replacing the three "testOutOfSyncAtBeginningOfSegmentX" methods with a loop from 0-2. Feel free to punt if you think this is clearer.

        +1 once these are addressed.

        Show
        Aaron T. Myers added a comment - Patch looks really good. The tests in particular are very solid. Two nits: sp: "Synchronziing" Recommend replacing the three "testOutOfSyncAtBeginningOfSegmentX" methods with a loop from 0-2. Feel free to punt if you think this is clearer. +1 once these are addressed.
        Hide
        Todd Lipcon added a comment -

        Fixed the spelling typo. Going to punt on the other thing - the different loop iterations fail separately enough that it's easier to diagnose them as separate test cases.

        Will commit momentarily with the nit addressed.

        Show
        Todd Lipcon added a comment - Fixed the spelling typo. Going to punt on the other thing - the different loop iterations fail separately enough that it's easier to diagnose them as separate test cases. Will commit momentarily with the nit addressed.
        Todd Lipcon made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s QuorumJournalManager (HDFS-3077) [ 12322478 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development