Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3077 Quorum-based protocol for reading and writing edit logs
  3. HDFS-3862

QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics

    Details

    • Type: Sub-task Sub-task
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: QuorumJournalManager (HDFS-3077)
    • Fix Version/s: None
    • Component/s: ha
    • Labels:

      Description

      Currently, NN HA requires that the administrator configure a fencing method to ensure that only a single NameNode may write to the shared storage at a time. Some shared edits storage implementations (like QJM) inherently enforce single-writer semantics at the storage level, and thus the user should not be forced to specify one.

      We should extend the JournalManager interface so that the HA code can operate without a configured fencer if the JM has such built-in fencing.

      1. HDFS-3862.001.patch
        29 kB
        Yi Liu
      2. HDFS-3862.002.patch
        30 kB
        Yi Liu
      3. HDFS-3862.003.patch
        30 kB
        Yi Liu

        Issue Links

          Activity

          Hide
          Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 37s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 4 new or modified test files.
          +1 javac 7m 30s There were no new javac warning messages.
          +1 javadoc 9m 37s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 2m 22s The applied patch generated 1 new checkstyle issues (total was 39, now 40).
          -1 whitespace 0m 1s The patch has 15 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 36s mvn install still works.
          +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
          +1 findbugs 5m 22s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          +1 common tests 22m 46s Tests passed in hadoop-common.
          +1 hdfs tests 166m 38s Tests passed in hadoop-hdfs.
          +1 hdfs tests 3m 56s Tests passed in bkjournal.
              236m 2s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12669903/HDFS-3862.003.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / a319771
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10763/artifact/patchprocess/diffcheckstylehadoop-common.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/10763/artifact/patchprocess/whitespace.txt
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/10763/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10763/artifact/patchprocess/testrun_hadoop-hdfs.txt
          bkjournal test log https://builds.apache.org/job/PreCommit-HDFS-Build/10763/artifact/patchprocess/testrun_bkjournal.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10763/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10763/console

          This message was automatically generated.

          Show
          Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 37s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 4 new or modified test files. +1 javac 7m 30s There were no new javac warning messages. +1 javadoc 9m 37s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 2m 22s The applied patch generated 1 new checkstyle issues (total was 39, now 40). -1 whitespace 0m 1s The patch has 15 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 36s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. +1 findbugs 5m 22s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 common tests 22m 46s Tests passed in hadoop-common. +1 hdfs tests 166m 38s Tests passed in hadoop-hdfs. +1 hdfs tests 3m 56s Tests passed in bkjournal.     236m 2s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12669903/HDFS-3862.003.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / a319771 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10763/artifact/patchprocess/diffcheckstylehadoop-common.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/10763/artifact/patchprocess/whitespace.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/10763/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10763/artifact/patchprocess/testrun_hadoop-hdfs.txt bkjournal test log https://builds.apache.org/job/PreCommit-HDFS-Build/10763/artifact/patchprocess/testrun_bkjournal.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10763/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10763/console This message was automatically generated.
          Hide
          Hadoop QA added a comment -

          The patch artifact directory has been removed!
          This is a fatal error for test-patch.sh. Aborting.
          Jenkins (node H4) information at https://builds.apache.org/job/PreCommit-HDFS-Build/10738/ may provide some hints.

          Show
          Hadoop QA added a comment - The patch artifact directory has been removed! This is a fatal error for test-patch.sh. Aborting. Jenkins (node H4) information at https://builds.apache.org/job/PreCommit-HDFS-Build/10738/ may provide some hints.
          Hide
          Uma Maheswara Rao G added a comment -

          Thanks a lot Yi for addressing the feedback.
          Me and Vinay has chat on this offline. The conclusion is same like to address above concern.
          What we thought is, we can control this fencer requirement via configuration item at ZKFC instead of having changes in JMs impls. Ideally concern is same that ZKFC need not initialize JM impls. There may be chances that JM impls can do some initialization work at that time. Example BKJM does some znode creations etc I remember. That things getting created from ZKFC is quite not right.
          So, How about we keep simple config at ZKFC itself and we avoid implementing isNativelySingleWriter API?
          Rest all conditions can be same how you handled in this patch. With command like force fence option, we try to use configured fencer and do fencing. In Normal failover, we do fencing depending on this new configuration item even though fencers configured.

          Show
          Uma Maheswara Rao G added a comment - Thanks a lot Yi for addressing the feedback. Me and Vinay has chat on this offline. The conclusion is same like to address above concern. What we thought is, we can control this fencer requirement via configuration item at ZKFC instead of having changes in JMs impls. Ideally concern is same that ZKFC need not initialize JM impls. There may be chances that JM impls can do some initialization work at that time. Example BKJM does some znode creations etc I remember. That things getting created from ZKFC is quite not right. So, How about we keep simple config at ZKFC itself and we avoid implementing isNativelySingleWriter API? Rest all conditions can be same how you handled in this patch. With command like force fence option, we try to use configured fencer and do fencing. In Normal failover, we do fencing depending on this new configuration item even though fencers configured.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12669903/HDFS-3862.003.patch
          against trunk revision 6434572.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal:

          org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
          org.apache.hadoop.hdfs.server.namenode.TestParallelImageWrite
          org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
          org.apache.hadoop.hdfs.server.namenode.ha.TestFailoverAndFencing

          The test build failed in hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8096//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8096//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669903/HDFS-3862.003.patch against trunk revision 6434572. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.server.namenode.TestParallelImageWrite org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.namenode.ha.TestFailoverAndFencing The test build failed in hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8096//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8096//console This message is automatically generated.
          Hide
          Yi Liu added a comment -

          Update patch to address all comments.

          Comment and log message confused me. Comment says we will do fence if user configured even though writer natively single writer.
          But log message says, we ignore fencing as it is not mandatory from native writer. Am missing something here?

          Fencing is not mandatory, and we do fencing because it's configured. Ignore the failure and log warning message if fencing fails.
          I have updated the code comment and log message.

          Where are you asserting ZKFC failure here?

          asserting ZKFS failure is in ZKFCThread, I have added code comment there in the new patch.

          Finally my another concern is that, we need to keep shared storage configs at ZKFC now with this change o find out the sharedstorage is single writer or not, but zkfc will not be ralated directly to writer in fact, as that is purely NN storage layer. I can understand that the approach you have chosen here is much simpler than making calls to NN to know that writer fencing requirement.

          Yes, I get your meaning. From my point of view, there are two reasons:

          • Currently, fencer configuration is also in hdfs-site.xml, and ZKFC read it directly. So for whether fencing is mandatory(JouralManager#isNativelySingleWriter), we use the same handling and ZKFC get from NN configuration directly. ZKFC doesn't have separate configure file and they run the same machine and use same configure file.
          • If we use a RPC call to NN to get whether fencing is mandatory, for ZKFC, yes, it only calls once. But for HAAdmin, we also need to call when run failover. This is not necessary.
          Show
          Yi Liu added a comment - Update patch to address all comments. Comment and log message confused me. Comment says we will do fence if user configured even though writer natively single writer. But log message says, we ignore fencing as it is not mandatory from native writer. Am missing something here? Fencing is not mandatory, and we do fencing because it's configured. Ignore the failure and log warning message if fencing fails. I have updated the code comment and log message. Where are you asserting ZKFC failure here? asserting ZKFS failure is in ZKFCThread, I have added code comment there in the new patch. Finally my another concern is that, we need to keep shared storage configs at ZKFC now with this change o find out the sharedstorage is single writer or not, but zkfc will not be ralated directly to writer in fact, as that is purely NN storage layer. I can understand that the approach you have chosen here is much simpler than making calls to NN to know that writer fencing requirement. Yes, I get your meaning. From my point of view, there are two reasons: Currently, fencer configuration is also in hdfs-site.xml, and ZKFC read it directly. So for whether fencing is mandatory(JouralManager#isNativelySingleWriter), we use the same handling and ZKFC get from NN configuration directly. ZKFC doesn't have separate configure file and they run the same machine and use same configure file. If we use a RPC call to NN to get whether fencing is mandatory, for ZKFC, yes, it only calls once. But for HAAdmin, we also need to call when run failover. This is not necessary.
          Hide
          Yi Liu added a comment -

          Thanks Uma Maheswara Rao G for the review. Will respond to you and update the patch later.

          Show
          Yi Liu added a comment - Thanks Uma Maheswara Rao G for the review. Will respond to you and update the patch later.
          Hide
          Uma Maheswara Rao G added a comment -

          Thanks a lot Yi for working on this patch.
          Overall Patch looks good. I have some comments though.

          • testFencorIsNotMandatoryAndIsNotConfigured --> testFencerIsNotMandatoryAndIsNotConfigured
          • .
            @Test(timeout=120000)
            

            need format?

          • .
            public void stopCluster() throws Exception {
            +    cluster.shutdown();
            +    
            

            You should check cluster not null in general.

          • .
            // start cluster with 2 NameNodes
            -        MiniDFSNNTopology topology = createDefaultTopology(basePort);
            +        MiniDFSNNTopology topology = builder.nnTopology == null ? 
            +            createDefaultTopology(basePort) : builder.nnTopology;
            

            Update comment as per new change?

          • .
            * 3. fencer is configured, even if fencing is not mandatory
            ......
            .....
             } else {
            +          // Fencing is not mandatory, but fencer is configured
            +          LOG.warn(msg + ", and ignore fencing since it's not mandatory.");
            +        }
            

            Comment and log message confused me. Comment says we will do fence if user configured even though writer natively single writer.
            But log message says, we ignore fencing as it is not mandatory from native writer. Am missing something here?

          • .
            Test failover w/ or w/o fencer. ---> Test failover w/ and w/o fencer ?

          • zkfc starts failed --> ZKFC failed to start ?
          • /** 
            +   * Fencing is mandatory and fencer is not configured
            +   * Result: zkfc starts failed 
            +   */
            +  @Test(timeout=120000)
            +  public void testFencerIsMandatoryAndNotConfigured() 
            +      throws Exception { 
            +    try {
            +      startCluster(true, false, FencerConfig.NO_FENCER, true);
            +    } finally {
            +      stopCluster();
            +    }
            +  }
            

            Where are you asserting ZKFC failure here?

          • Finally my another concern is that, we need to keep shared storage configs at ZKFC now with this change o find out the sharedstorage is single writer or not, but zkfc will not be ralated directly to writer in fact, as that is purely NN storage layer. I can understand that the approach you have chosen here is much simpler than making calls to NN to know that writer fencing requirement.
            What others feel on this?

          Show
          Uma Maheswara Rao G added a comment - Thanks a lot Yi for working on this patch. Overall Patch looks good. I have some comments though. testFencorIsNotMandatoryAndIsNotConfigured --> testFencerIsNotMandatoryAndIsNotConfigured . @Test(timeout=120000) need format? . public void stopCluster() throws Exception { + cluster.shutdown(); + You should check cluster not null in general. . // start cluster with 2 NameNodes - MiniDFSNNTopology topology = createDefaultTopology(basePort); + MiniDFSNNTopology topology = builder.nnTopology == null ? + createDefaultTopology(basePort) : builder.nnTopology; Update comment as per new change? . * 3. fencer is configured, even if fencing is not mandatory ...... ..... } else { + // Fencing is not mandatory, but fencer is configured + LOG.warn(msg + ", and ignore fencing since it's not mandatory." ); + } Comment and log message confused me. Comment says we will do fence if user configured even though writer natively single writer. But log message says, we ignore fencing as it is not mandatory from native writer. Am missing something here? . Test failover w/ or w/o fencer. ---> Test failover w/ and w/o fencer ? zkfc starts failed --> ZKFC failed to start ? /** + * Fencing is mandatory and fencer is not configured + * Result: zkfc starts failed + */ + @Test(timeout=120000) + public void testFencerIsMandatoryAndNotConfigured() + throws Exception { + try { + startCluster( true , false , FencerConfig.NO_FENCER, true ); + } finally { + stopCluster(); + } + } Where are you asserting ZKFC failure here? Finally my another concern is that, we need to keep shared storage configs at ZKFC now with this change o find out the sharedstorage is single writer or not, but zkfc will not be ralated directly to writer in fact, as that is purely NN storage layer. I can understand that the approach you have chosen here is much simpler than making calls to NN to know that writer fencing requirement. What others feel on this?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12662412/HDFS-3862.002.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal:

          org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7677//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7677//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662412/HDFS-3862.002.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7677//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7677//console This message is automatically generated.
          Hide
          Yi Liu added a comment -

          fix test failure.

          Show
          Yi Liu added a comment - fix test failure.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12662361/HDFS-3862.001.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal:

          org.apache.hadoop.ha.TestZKFailoverController
          org.apache.hadoop.hdfs.server.namenode.ha.TestFailoverAndFencing
          org.apache.hadoop.hdfs.tools.TestDFSHAAdmin
          org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster
          org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

          The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal:

          org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
          org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics
          org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
          org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
          org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings
          org.apache.hadoop.contrib.bkjournal.TestBookKeeperAsHASharedDir

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7656//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7656//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662361/HDFS-3862.001.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.ha.TestZKFailoverController org.apache.hadoop.hdfs.server.namenode.ha.TestFailoverAndFencing org.apache.hadoop.hdfs.tools.TestDFSHAAdmin org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings org.apache.hadoop.contrib.bkjournal.TestBookKeeperAsHASharedDir +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7656//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7656//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12662359/HDFS-3862.001.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal:

          org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster
          org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
          org.apache.hadoop.hdfs.tools.TestDFSHAAdmin

          The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal:

          org.apache.hadoop.hdfs.TestHDFSServerPorts
          org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
          org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics
          org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
          org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
          org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings
          org.apache.hadoop.contrib.bkjournal.TestBookKeeperAsHASharedDir

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7655//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7655//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662359/HDFS-3862.001.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.tools.TestDFSHAAdmin The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.TestHDFSServerPorts org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings org.apache.hadoop.contrib.bkjournal.TestBookKeeperAsHASharedDir +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7655//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7655//console This message is automatically generated.
          Hide
          Yi Liu added a comment -

          The patch adds a new API to JournalManager: boolean isNativelySingleWriter();
          And hanle following cases:

          • if the shared storage has built-in single-write semantics, then user is not forced to specify a fencer.
          • if #1, but a forcer is configured, then ZKFC will do forcing as original logic; but if forcing is failed, it’s ignored with warning log. Failover will continue.
          • if #1, but a bad forcer is configured, then forcer is ignored with warning log. Failover will continue.
          • if specify “forcefence” option for failover when using DFSHAAdmin, then a forcer should be configured even if #1
          Show
          Yi Liu added a comment - The patch adds a new API to JournalManager: boolean isNativelySingleWriter(); And hanle following cases: if the shared storage has built-in single-write semantics, then user is not forced to specify a fencer. if #1, but a forcer is configured, then ZKFC will do forcing as original logic; but if forcing is failed, it’s ignored with warning log. Failover will continue. if #1, but a bad forcer is configured, then forcer is ignored with warning log. Failover will continue. if specify “forcefence” option for failover when using DFSHAAdmin, then a forcer should be configured even if #1
          Hide
          Yi Liu added a comment -

          After talking with Rakesh R, I will work on this issue. Thanks.

          Show
          Yi Liu added a comment - After talking with Rakesh R , I will work on this issue. Thanks.
          Hide
          Rakesh R added a comment -

          Hi Todd Lipcon, Uma Maheswara Rao G,

          I'm also thinking, configuration approach is simple. Hope this configuration would be configured in ZKFC process, isn't it?. Kindly let me know your suggestions and I'd like to take this ahead. Thanks!

          Show
          Rakesh R added a comment - Hi Todd Lipcon , Uma Maheswara Rao G , I'm also thinking, configuration approach is simple. Hope this configuration would be configured in ZKFC process, isn't it?. Kindly let me know your suggestions and I'd like to take this ahead. Thanks!
          Hide
          Uma Maheswara Rao G added a comment -

          Todd, It seems like reasonable to me. I also filed one JIRA to handle this situation with single writer HDFS-3854.
          But I thought, we could simply provide a fence method which will fence the writer, that means that we have guaranteed that no other NN can access shared storage and then go for state change.
          In fact if we are ok with leaving the fence to writer level, that is more good.
          Currently also simply we have a dummy fence method, which will return true as BK already has fencing.

          From above suggestion, adding API in JournalManager, it may require to creating the JournalManager for getting this info in ZKFC right?
          How about simply adding one config parameter?

          Show
          Uma Maheswara Rao G added a comment - Todd, It seems like reasonable to me. I also filed one JIRA to handle this situation with single writer HDFS-3854 . But I thought, we could simply provide a fence method which will fence the writer, that means that we have guaranteed that no other NN can access shared storage and then go for state change. In fact if we are ok with leaving the fence to writer level, that is more good. Currently also simply we have a dummy fence method, which will return true as BK already has fencing. From above suggestion, adding API in JournalManager, it may require to creating the JournalManager for getting this info in ZKFC right? How about simply adding one config parameter?
          Hide
          Todd Lipcon added a comment -

          I think this might be the case for BookKeeper as well. Any of the folks working on BKJM want to take this on? I anticipate we would add a simple API to JournalManager like: boolean isNativelySingleWriter(); or boolean needsExternalFencing();. Then the failover code could check the shared storage dir to see if this is the case, and if so, not error out if the user doesn't specify a fence method.

          Show
          Todd Lipcon added a comment - I think this might be the case for BookKeeper as well. Any of the folks working on BKJM want to take this on? I anticipate we would add a simple API to JournalManager like: boolean isNativelySingleWriter(); or boolean needsExternalFencing(); . Then the failover code could check the shared storage dir to see if this is the case, and if so, not error out if the user doesn't specify a fence method.

            People

            • Assignee:
              Yi Liu
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:

                Development