Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: HA branch (HDFS-1623)
    • Fix Version/s: HA branch (HDFS-1623)
    • Component/s: ha, namenode
    • Labels:
      None

      Description

      Currently when the active transitions to standby (eg for a manual failover) the EditLogTailer starts tailing edits at the wrong transaction ID. This causes it to double-apply a bunch of edits, which causes various problems. We need to add a test which checks this state transition and fix any related bugs.

      1. hdfs-2667.txt
        11 kB
        Todd Lipcon
      2. hdfs-2667.txt
        8 kB
        Todd Lipcon
      3. hdfs-2667.txt
        8 kB
        Todd Lipcon
      4. hdfs-2667.txt
        9 kB
        Todd Lipcon

        Activity

        Hide
        Todd Lipcon added a comment -

        Attached patch fixes the issues and adds unit tests for repeated manual failover and fail-back between two NNs. Both new test cases failed without the bug fixes.

        This applies on top of HDFS-2602 and HDFS-1972 though I think only 2602 is necessary.

        Show
        Todd Lipcon added a comment - Attached patch fixes the issues and adds unit tests for repeated manual failover and fail-back between two NNs. Both new test cases failed without the bug fixes. This applies on top of HDFS-2602 and HDFS-1972 though I think only 2602 is necessary.
        Hide
        Aaron T. Myers added a comment -

        One tiny comment: in testTransitionActiveToStandby, you should use TEST_FILE_PATH instead of new Path(TEST_DIR, "foo").

        +1 once that's fixed.

        Show
        Aaron T. Myers added a comment - One tiny comment: in testTransitionActiveToStandby , you should use TEST_FILE_PATH instead of new Path(TEST_DIR, "foo") . +1 once that's fixed.
        Hide
        Todd Lipcon added a comment -

        attached patch with atm's suggested test fix. Waiting on HDFS-2602 to commit this since the test fails without it.

        Show
        Todd Lipcon added a comment - attached patch with atm's suggested test fix. Waiting on HDFS-2602 to commit this since the test fails without it.
        Hide
        Todd Lipcon added a comment -

        Oops, I had resolved a conflict wrong in previous upload.. this one fixes it, but still waiting on 2602 for test to pass.

        Show
        Todd Lipcon added a comment - Oops, I had resolved a conflict wrong in previous upload.. this one fixes it, but still waiting on 2602 for test to pass.
        Hide
        Todd Lipcon added a comment -

        one more rev here - previous iteration broke backup node and TestFileJournalManager.

        Show
        Todd Lipcon added a comment - one more rev here - previous iteration broke backup node and TestFileJournalManager.
        Hide
        Aaron T. Myers added a comment -

        +1, latest patch looks good to me.

        Show
        Aaron T. Myers added a comment - +1, latest patch looks good to me.
        Hide
        Todd Lipcon added a comment -

        Committed to the branch, thanks Aaron.

        Show
        Todd Lipcon added a comment - Committed to the branch, thanks Aaron.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-HAbranch-build #18 (See https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/18/)
        HDFS-2667. Fix transition from active to standby. Contributed by Todd Lipcon.

        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215037
        Files :

        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-1623.txt
        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHAStateTransitions.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-HAbranch-build #18 (See https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/18/ ) HDFS-2667 . Fix transition from active to standby. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215037 Files : /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/CHANGES. HDFS-1623 .txt /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHAStateTransitions.java

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development