Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1623 High Availability Framework for HDFS NN
  3. HDFS-2863

Failures observed if dfs.edits.dir and shared.edits.dir have same directories.

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: HA branch (HDFS-1623)
    • Fix Version/s: HA branch (HDFS-1623)
    • Component/s: ha, namenode
    • Labels:
      None

      Description

      If same edits directory is configured in twice, both are treated independently. Edit log roll is called on the same directory twice causing exceptions.

      1. HDFS-2863.HDFS-1623.patch
        1 kB
        Bikas Saha
      2. HDFS-2863.HDFS-1623.patch
        4 kB
        Bikas Saha
      3. HDFS-2863.HDFS-1623.patch
        4 kB
        Bikas Saha
      4. HDFS-2863.HDFS-1623.patch
        5 kB
        Bikas Saha
      5. HDFS-2863.HDFS-1623.patch
        6 kB
        Bikas Saha

        Activity

        Hide
        Jitendra Nath Pandey added a comment -

        2012-01-31 01:28:26,391 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(94)) - Finalizi
        ng edits file /tmp/namenode/current/edits_inprogress_0000000000000986027 -> /tmp/namenode/current/edits_0000000000000986027-0000000000000988539
        2012-01-31 01:28:26,395 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(94)) - Finalizing edits file /tmp/namenode/current/edits_inprogress_0000000000000986027 -> /tmp/namenode/curre
        nt/edits_0000000000000986027-0000000000000988539
        2012-01-31 01:28:26,395 ERROR namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(317)) - Error: finalize
        log segment 986027, 988539 failed for (journal JournalAndStream(mgr=FileJournalManager(root=/tmp/namenode)
        , stream=null))
        java.lang.IllegalStateException: Can't finalize edits file /tmp/current/edits_inprogress_00000000
        00000986027 since finalized file already exists
        at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
        at org.apache.hadoop.hdfs.server.namenode.FileJournalManager.finalizeLogSegment(FileJournalManager.java:96)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$2.apply(JournalSet.java:175) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:315)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:170) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:901)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:831)

        Show
        Jitendra Nath Pandey added a comment - 2012-01-31 01:28:26,391 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(94)) - Finalizi ng edits file /tmp/namenode/current/edits_inprogress_0000000000000986027 -> /tmp/namenode/current/edits_0000000000000986027-0000000000000988539 2012-01-31 01:28:26,395 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(94)) - Finalizing edits file /tmp/namenode/current/edits_inprogress_0000000000000986027 -> /tmp/namenode/curre nt/edits_0000000000000986027-0000000000000988539 2012-01-31 01:28:26,395 ERROR namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(317)) - Error: finalize log segment 986027, 988539 failed for (journal JournalAndStream(mgr=FileJournalManager(root=/tmp/namenode) , stream=null)) java.lang.IllegalStateException: Can't finalize edits file /tmp/current/edits_inprogress_00000000 00000986027 since finalized file already exists at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hdfs.server.namenode.FileJournalManager.finalizeLogSegment(FileJournalManager.java:96) at org.apache.hadoop.hdfs.server.namenode.JournalSet$2.apply(JournalSet.java:175) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:315) at org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:170) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:901) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:831)
        Hide
        Bikas Saha added a comment -

        Does it make sense to assert that the JournalSet must consists of distinct URI's as a policy decision. Then this check could be added to the JournalSet.

        Show
        Bikas Saha added a comment - Does it make sense to assert that the JournalSet must consists of distinct URI's as a policy decision. Then this check could be added to the JournalSet.
        Hide
        Bikas Saha added a comment -

        Looks like it does not since the BackupJournal does not seem to be linked to a URI.

        Show
        Bikas Saha added a comment - Looks like it does not since the BackupJournal does not seem to be linked to a URI.
        Hide
        Bikas Saha added a comment -

        Simple patch to fix the issue. Any existing tests I could change?

        Show
        Bikas Saha added a comment - Simple patch to fix the issue. Any existing tests I could change?
        Hide
        Aaron T. Myers added a comment -

        The patch largely looks good, Bikas. A few comments:

        1. I think "editsDirs.clear(); editsDirs.addAll(uniqueEditsDirs);" can just be replaced with "editsDirs = uniqueEditsDirs", right?
        2. Instead of saying "duplicate entries in edits/shared.edits dirs" in the error message, how about something more helpful to the user, like '"overlapping entries in " + DFSConfigKeys.DFS_NAMENODE_EDITS_DIR_KEY + " and " + DFSConfigKeys.DFS_NAMENODE_SHARED_EDITS_DIR_KEY' ?
        3. Rather than listing the number of entries that will be used, how about listing the actual entries using Joiner?

        Any existing tests I could change?

        None that I know of, so please add one or two.

        Show
        Aaron T. Myers added a comment - The patch largely looks good, Bikas. A few comments: I think " editsDirs.clear(); editsDirs.addAll(uniqueEditsDirs); " can just be replaced with " editsDirs = uniqueEditsDirs ", right? Instead of saying "duplicate entries in edits/shared.edits dirs" in the error message, how about something more helpful to the user, like '"overlapping entries in " + DFSConfigKeys.DFS_NAMENODE_EDITS_DIR_KEY + " and " + DFSConfigKeys.DFS_NAMENODE_SHARED_EDITS_DIR_KEY' ? Rather than listing the number of entries that will be used, how about listing the actual entries using Joiner? Any existing tests I could change? None that I know of, so please add one or two.
        Hide
        Bikas Saha added a comment -

        I did not assign editDirs = uniqueEditDirs because I did not want the Set semantics to leak into the Collection return type of that method.
        These are small lists anyways and so I wasnt bothered about the perf.

        I will look at the other suggestions starting with finding out what a Joiner is

        Show
        Bikas Saha added a comment - I did not assign editDirs = uniqueEditDirs because I did not want the Set semantics to leak into the Collection return type of that method. These are small lists anyways and so I wasnt bothered about the perf. I will look at the other suggestions starting with finding out what a Joiner is
        Hide
        Aaron T. Myers added a comment -

        I did not assign editDirs = uniqueEditDirs because I did not want the Set semantics to leak into the Collection return type of that method. These are small lists anyways and so I wasnt bothered about the perf.

        Seems perhaps a little overly-cautious, as nobody should be mutating the results of what's returned from these methods, but I agree it can't hurt anything the way you have it.

        I will look at the other suggestions starting with finding out what a Joiner is

        http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/base/Joiner.html

        Just something for concatenating collections with a separator, e.g. 'Joiner.on(",").join(<collection>)'

        Show
        Aaron T. Myers added a comment - I did not assign editDirs = uniqueEditDirs because I did not want the Set semantics to leak into the Collection return type of that method. These are small lists anyways and so I wasnt bothered about the perf. Seems perhaps a little overly-cautious, as nobody should be mutating the results of what's returned from these methods, but I agree it can't hurt anything the way you have it. I will look at the other suggestions starting with finding out what a Joiner is http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/base/Joiner.html Just something for concatenating collections with a separator, e.g. 'Joiner.on(",").join(<collection>)'
        Hide
        Bikas Saha added a comment -

        Found Joiner on Google. Thanks anyways.

        Attached latest patch with test.

        Show
        Bikas Saha added a comment - Found Joiner on Google. Thanks anyways. Attached latest patch with test.
        Hide
        Aaron T. Myers added a comment -

        Thanks for adding the test. A few small comments:

        1. No need for the "joiner" variable - it's only used once.
        2. I think "TestFSNamesystem" isn't a very good name for this test, since this really only tests a very specific part of HA configuration. Perhaps just add this test case to TestHAConfiguration?
        3. Since we currently only support a single shared edits dir, the test could be made more realistic by not configuring several.
        Show
        Aaron T. Myers added a comment - Thanks for adding the test. A few small comments: No need for the "joiner" variable - it's only used once. I think "TestFSNamesystem" isn't a very good name for this test, since this really only tests a very specific part of HA configuration. Perhaps just add this test case to TestHAConfiguration? Since we currently only support a single shared edits dir, the test could be made more realistic by not configuring several.
        Hide
        Bikas Saha added a comment -

        1.Will do
        2.I added the TestFSNamesystem to be a class for adding future unit tests for the same source class. The test is not really HA specific since the dups can occur in the edits.dirs config itself right?
        3.The unit test tests what the function does. So IMO it makes sense the way its written currently. The fact that we currently support a single shared edits dir is not related to the test and should be ideally covered in a functional sanity test for the HA config feature right?

        Will update the patch shortly.

        Show
        Bikas Saha added a comment - 1.Will do 2.I added the TestFSNamesystem to be a class for adding future unit tests for the same source class. The test is not really HA specific since the dups can occur in the edits.dirs config itself right? 3.The unit test tests what the function does. So IMO it makes sense the way its written currently. The fact that we currently support a single shared edits dir is not related to the test and should be ideally covered in a functional sanity test for the HA config feature right? Will update the patch shortly.
        Hide
        Bikas Saha added a comment -

        new patch.

        Show
        Bikas Saha added a comment - new patch.
        Hide
        Aaron T. Myers added a comment -

        I added the TestFSNamesystem to be a class for adding future unit tests for the same source class. The test is not really HA specific since the dups can occur in the edits.dirs config itself right?

        The test is HA-specific in that it's meant to test configuration of the shared edits dir, which itself is HA-specific. The fact that it also happens to test for overlaps within the edits dir configuration seems incidental to me. In fact, I would prefer if the test were more isolated so that it doesn't test for overlaps within the edits dir configuration. We should add a separate test for that behavior.

        The unit test tests what the function does. So IMO it makes sense the way its written currently. The fact that we currently support a single shared edits dir is not related to the test and should be ideally covered in a functional sanity test for the HA config feature right?

        Sure, but the test is strictly less realistic than it could be because of this. Better to verify the behavior under the conditions we expect users to use, rather than some theoretical case which isn't currently supported.

        Also, if the test is left as-is, whenever HDFS-2752 gets implemented, this test will need to be amended because it will no longer be valid.

        Show
        Aaron T. Myers added a comment - I added the TestFSNamesystem to be a class for adding future unit tests for the same source class. The test is not really HA specific since the dups can occur in the edits.dirs config itself right? The test is HA-specific in that it's meant to test configuration of the shared edits dir, which itself is HA-specific. The fact that it also happens to test for overlaps within the edits dir configuration seems incidental to me. In fact, I would prefer if the test were more isolated so that it doesn't test for overlaps within the edits dir configuration. We should add a separate test for that behavior. The unit test tests what the function does. So IMO it makes sense the way its written currently. The fact that we currently support a single shared edits dir is not related to the test and should be ideally covered in a functional sanity test for the HA config feature right? Sure, but the test is strictly less realistic than it could be because of this. Better to verify the behavior under the conditions we expect users to use, rather than some theoretical case which isn't currently supported. Also, if the test is left as-is, whenever HDFS-2752 gets implemented, this test will need to be amended because it will no longer be valid.
        Hide
        Bikas Saha added a comment -

        I think we are mixing unit tests and functional tests here. If 2752 is properly implemented then I dont see why it should break the behavior of the existing methods since it defines a policy on top of a mechanism and these classes implement the mechanism.

        In any case, there is no harm done in splitting them and its trivial to do so. Will update shortly.

        Show
        Bikas Saha added a comment - I think we are mixing unit tests and functional tests here. If 2752 is properly implemented then I dont see why it should break the behavior of the existing methods since it defines a policy on top of a mechanism and these classes implement the mechanism. In any case, there is no harm done in splitting them and its trivial to do so. Will update shortly.
        Hide
        Bikas Saha added a comment -

        I split the tests and renamed the one with shared edits to contain HA in the name.

        I did not find a TestHAConfiguration in the source code.

        Show
        Bikas Saha added a comment - I split the tests and renamed the one with shared edits to contain HA in the name. I did not find a TestHAConfiguration in the source code.
        Hide
        Aaron T. Myers added a comment -

        I did not find a TestHAConfiguration in the source code.

        It got committed today as part of HDFS-2861, and seems to me like a more appropriate place for testHAUniqueEditDirs.

        Show
        Aaron T. Myers added a comment - I did not find a TestHAConfiguration in the source code. It got committed today as part of HDFS-2861 , and seems to me like a more appropriate place for testHAUniqueEditDirs.
        Hide
        Bikas Saha added a comment -

        Got it after a git pull.
        Attached.

        Show
        Bikas Saha added a comment - Got it after a git pull. Attached.
        Hide
        Aaron T. Myers added a comment -

        +1, the latest patch looks good to me. Thanks a lot for addressing all of my feedback, Bikas.

        I'll commit this later today if there's no further comments from anyone else.

        Show
        Aaron T. Myers added a comment - +1, the latest patch looks good to me. Thanks a lot for addressing all of my feedback, Bikas. I'll commit this later today if there's no further comments from anyone else.
        Hide
        Bikas Saha added a comment -

        Thanks!

        Show
        Bikas Saha added a comment - Thanks!
        Hide
        Aaron T. Myers added a comment -

        I've just committed this to the HA branch. Thanks a lot, Bikas.

        Show
        Aaron T. Myers added a comment - I've just committed this to the HA branch. Thanks a lot, Bikas.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-HAbranch-build #68 (See https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/68/)
        HDFS-2863. Failures observed if dfs.edits.dir and shared.edits.dir have same directories. Contributed by Bikas Saha.

        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1240267
        Files :

        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-1623.txt
        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java
        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHAConfiguration.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-HAbranch-build #68 (See https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/68/ ) HDFS-2863 . Failures observed if dfs.edits.dir and shared.edits.dir have same directories. Contributed by Bikas Saha. atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1240267 Files : /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/CHANGES. HDFS-1623 .txt /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHAConfiguration.java

          People

          • Assignee:
            Bikas Saha
            Reporter:
            Jitendra Nath Pandey
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development