Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10921

TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.7.4, 3.0.0-alpha2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Test fails intermittently because the NN is still in safe mode.

      1. HDFS-10921.001.patch
        0.8 kB
        Eric Badger
      2. HDFS-10921.002.patch
        0.7 kB
        Eric Badger
      3. HDFS-10921.003.patch
        1 kB
        Eric Badger
      4. HDFS-10921.004.patch
        2 kB
        Eric Badger

        Issue Links

          Activity

          Hide
          ebadger Eric Badger added a comment -

          Stack trace of test failure

          Cannot create directory /TestQuotaUpdate/testAppendOverTypeQuota. Name node is in safe mode.
          The reported blocks 1749 needs additional 251 blocks to reach the threshold 0.9990 of total blocks 2003.
          The number of live datanodes 4 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
           at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1372)
           at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1359)
           at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3004)
           at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1080)
           at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:637)
           at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
           at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
           at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:823)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:771)
           at java.security.AccessController.doPrivileged(Native Method)
           at javax.security.auth.Subject.doAs(Subject.java:422)
           at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1805)
           at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2425)
          
          Show
          ebadger Eric Badger added a comment - Stack trace of test failure Cannot create directory /TestQuotaUpdate/testAppendOverTypeQuota. Name node is in safe mode. The reported blocks 1749 needs additional 251 blocks to reach the threshold 0.9990 of total blocks 2003. The number of live datanodes 4 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1372) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1359) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3004) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1080) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:637) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:823) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:771) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1805) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2425)
          Hide
          ebadger Eric Badger added a comment -

          Attaching patch that makes the cluster wait until the NN is active

          Show
          ebadger Eric Badger added a comment - Attaching patch that makes the cluster wait until the NN is active
          Hide
          ebadger Eric Badger added a comment -

          Looked deeper into this and saw that cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION).build(); actually calls cluster.waitClusterUp(). The problem actually is that some tests call cluster.restartNameNodes(). There is an option to wait for the namenodes to become active, but that is currently set to false. I'm attaching a patch that sets this value to true so that the tests wait for the NN to get all the way back up before moving onto the next test.

          Show
          ebadger Eric Badger added a comment - Looked deeper into this and saw that cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION).build(); actually calls cluster.waitClusterUp(). The problem actually is that some tests call cluster.restartNameNodes() . There is an option to wait for the namenodes to become active, but that is currently set to false. I'm attaching a patch that sets this value to true so that the tests wait for the NN to get all the way back up before moving onto the next test.
          Hide
          ebadger Eric Badger added a comment -

          Rushabh Shah, Daryn Sharp, Kihwal Lee, do you think it's reasonable to change restartNameNodes() to wait for active by default? Lots of tests call this function and it could add non-negligible runtime to the tests. However, the argument could be made that the NN hasn't finished restarting until it is back out of safemode. So I'm wondering if we should keep the patch as is or if we should special-case fix TestDiskspaceQuotaUpdate to call restartNameNode() with waitActive == true.

          Show
          ebadger Eric Badger added a comment - Rushabh Shah , Daryn Sharp , Kihwal Lee , do you think it's reasonable to change restartNameNodes() to wait for active by default? Lots of tests call this function and it could add non-negligible runtime to the tests. However, the argument could be made that the NN hasn't finished restarting until it is back out of safemode. So I'm wondering if we should keep the patch as is or if we should special-case fix TestDiskspaceQuotaUpdate to call restartNameNode() with waitActive == true.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 9s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 4s trunk passed
          +1 compile 0m 48s trunk passed
          +1 checkstyle 0m 26s trunk passed
          +1 mvnsite 0m 51s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 56s trunk passed
          +1 javadoc 1m 9s trunk passed
          +1 mvninstall 1m 3s the patch passed
          +1 compile 0m 52s the patch passed
          +1 javac 0m 52s the patch passed
          +1 checkstyle 0m 24s the patch passed
          +1 mvnsite 0m 50s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 1s The patch has no whitespace issues.
          +1 findbugs 1m 47s the patch passed
          +1 javadoc 0m 53s the patch passed
          -1 unit 75m 54s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          95m 57s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
            hadoop.hdfs.TestBlockStoragePolicy



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10921
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12830709/HDFS-10921.001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux fd2cca2e2c03 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / e19b37e
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16910/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16910/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16910/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 9s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 4s trunk passed +1 compile 0m 48s trunk passed +1 checkstyle 0m 26s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 56s trunk passed +1 javadoc 1m 9s trunk passed +1 mvninstall 1m 3s the patch passed +1 compile 0m 52s the patch passed +1 javac 0m 52s the patch passed +1 checkstyle 0m 24s the patch passed +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 1s The patch has no whitespace issues. +1 findbugs 1m 47s the patch passed +1 javadoc 0m 53s the patch passed -1 unit 75m 54s hadoop-hdfs in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 95m 57s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency   hadoop.hdfs.TestBlockStoragePolicy Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10921 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12830709/HDFS-10921.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux fd2cca2e2c03 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / e19b37e Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16910/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16910/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16910/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 14s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 41s trunk passed
          +1 compile 0m 52s trunk passed
          +1 checkstyle 0m 28s trunk passed
          +1 mvnsite 0m 54s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 48s trunk passed
          +1 javadoc 0m 56s trunk passed
          +1 mvninstall 0m 54s the patch passed
          +1 compile 0m 46s the patch passed
          +1 javac 0m 46s the patch passed
          +1 checkstyle 0m 29s the patch passed
          +1 mvnsite 0m 53s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 57s the patch passed
          +1 javadoc 0m 55s the patch passed
          -1 unit 63m 1s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          83m 43s



          Reason Tests
          Failed junit tests hadoop.hdfs.TestDFSInotifyEventInputStream
            hadoop.hdfs.server.datanode.TestDataNodeUUID



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10921
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12830719/HDFS-10921.002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 1127f0d6ea1a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / e19b37e
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16911/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16911/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16911/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 41s trunk passed +1 compile 0m 52s trunk passed +1 checkstyle 0m 28s trunk passed +1 mvnsite 0m 54s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 48s trunk passed +1 javadoc 0m 56s trunk passed +1 mvninstall 0m 54s the patch passed +1 compile 0m 46s the patch passed +1 javac 0m 46s the patch passed +1 checkstyle 0m 29s the patch passed +1 mvnsite 0m 53s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 57s the patch passed +1 javadoc 0m 55s the patch passed -1 unit 63m 1s hadoop-hdfs in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 83m 43s Reason Tests Failed junit tests hadoop.hdfs.TestDFSInotifyEventInputStream   hadoop.hdfs.server.datanode.TestDataNodeUUID Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10921 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12830719/HDFS-10921.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 1127f0d6ea1a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / e19b37e Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16911/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16911/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16911/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          shahrs87 Rushabh S Shah added a comment -

          IN HDFS-10843, @Before is replaced by @BeforeClass and @After is replaced by @AfterClass.
          This can cause contamination of the name system if we create same paths in 2 test cases and want to test different things in both tests.
          One way I can think to fix that is we can call FSNamesystem#clear before/after every test case.

          do you think it's reasonable to change restartNameNodes() to wait for active by default?

          Do we really need to restart namenodes ?
          If we just call FSNamesystem#clear as @Before annotation, then there is no need to restart namenodes.
          There is one catch in this.
          This doesn't clear the datanode state.
          AFAICT this test suite doesn't depend on the datanode state so we should be fine.
          Just pinging Erik Krogen and Konstantin Shvachko if they have better ideas.

          Show
          shahrs87 Rushabh S Shah added a comment - IN HDFS-10843 , @Before is replaced by @BeforeClass and @After is replaced by @AfterClass. This can cause contamination of the name system if we create same paths in 2 test cases and want to test different things in both tests. One way I can think to fix that is we can call FSNamesystem#clear before/after every test case. do you think it's reasonable to change restartNameNodes() to wait for active by default? Do we really need to restart namenodes ? If we just call FSNamesystem#clear as @Before annotation, then there is no need to restart namenodes. There is one catch in this. This doesn't clear the datanode state. AFAICT this test suite doesn't depend on the datanode state so we should be fine. Just pinging Erik Krogen and Konstantin Shvachko if they have better ideas.
          Hide
          xkrogen Erik Krogen added a comment - - edited

          Rushabh S Shah, the concern about namespace pollution is valid, but this is why the first line of each of the tests is to create a Path that is unique to the given test case and all subsequent operations occur under that Path. Especially given the error that Eric has posted, I don't think that is the issue.

          The reason that cluster.restartNameNodes() is called, IIUC, is to initiate a check of the edit log to ensure that edits aren't corrupted as per the comment right above the call - previously the cluster was completely recreated after each test case so there would be no reason (in terms of cleaning) to restart the namenode at the end of the test. Pinging Jing Zhao to confirm.

          Eric Badger, first off, good catch. Apologies for introducing this issue and thank you for helping to deal with it. Can we just change the calls to cluster.restartNameNodes() to cluster.restartNameNode(true)? There is only one NN in this test so we can just use the already-available method on MiniDFSCluster and keep the change local to the test itself.

          Show
          xkrogen Erik Krogen added a comment - - edited Rushabh S Shah , the concern about namespace pollution is valid, but this is why the first line of each of the tests is to create a Path that is unique to the given test case and all subsequent operations occur under that Path. Especially given the error that Eric has posted, I don't think that is the issue. The reason that cluster.restartNameNodes() is called, IIUC, is to initiate a check of the edit log to ensure that edits aren't corrupted as per the comment right above the call - previously the cluster was completely recreated after each test case so there would be no reason (in terms of cleaning) to restart the namenode at the end of the test. Pinging Jing Zhao to confirm. Eric Badger , first off, good catch. Apologies for introducing this issue and thank you for helping to deal with it. Can we just change the calls to cluster.restartNameNodes() to cluster.restartNameNode(true) ? There is only one NN in this test so we can just use the already-available method on MiniDFSCluster and keep the change local to the test itself.
          Hide
          ebadger Eric Badger added a comment -

          Can we just change the calls to cluster.restartNameNodes() to cluster.restartNameNode(true)?

          Erik Krogen, that sounds like a reasonable change. The only reservation I have with that is that I've encountered problems when tests use @BeforeClass to start up a cluster and use it for all tests. If one of the tests fails and kills the cluster, then all subsequent tests will fail because they depended on the cluster being left in a usable state. I've seen 20 tests fail in a class because a single test failed. Maybe that's an acceptable tradeoff for decreased runtime, but it makes debugging the specific failure harder.

          Show
          ebadger Eric Badger added a comment - Can we just change the calls to cluster.restartNameNodes() to cluster.restartNameNode(true)? Erik Krogen , that sounds like a reasonable change. The only reservation I have with that is that I've encountered problems when tests use @BeforeClass to start up a cluster and use it for all tests. If one of the tests fails and kills the cluster, then all subsequent tests will fail because they depended on the cluster being left in a usable state. I've seen 20 tests fail in a class because a single test failed. Maybe that's an acceptable tradeoff for decreased runtime, but it makes debugging the specific failure harder.
          Hide
          xkrogen Erik Krogen added a comment -

          This is certainly true and something I discussed with Konstantin before making the change. We decided to go for decreased runtime. I wonder if a pattern could be developed to help with this. e.g., in a @Before block, check cluster health (isClusterUp()) and only if that fails then create a new cluster for subsequent tests?

          Show
          xkrogen Erik Krogen added a comment - This is certainly true and something I discussed with Konstantin before making the change. We decided to go for decreased runtime. I wonder if a pattern could be developed to help with this. e.g., in a @Before block, check cluster health ( isClusterUp() ) and only if that fails then create a new cluster for subsequent tests?
          Hide
          shahrs87 Rushabh S Shah added a comment - - edited

          the concern about namespace pollution is valid, but this is why the first line of each of the tests is to create a Path that is unique to the given test case and all subsequent operations occur under that Path.

          I agree that the issue mentioned in this jira is not due to namespace pollution.
          But this can cause test failures in future.
          Lets take for example: TestDiskspaceQuotaUpdate#testIncreaseReplicationBeforeCommitting and TestDiskspaceQuotaUpdate#testDecreaseReplicationBeforeCommitting
          Both of this test case calls testQuotaIssuesBeforeCommitting(short initialReplication,short finalReplication) to create the file.
          Here is the audit log for create call for file creation from testIncreaseReplicationBeforeCommitting:

          2016-09-29 11:19:02,069 [IPC Server handler 2 on 58161] INFO  FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7090)) - allowed=true       ugi=rushabhs (auth:SIMPLE)      ip=/127.0.0.1   cmd=create      src=/TestQuotaUpdate/testQuotaIssuesBeforeCommitting/1-4/testfile       dst=null        perm=rushabhs:supergroup:rw-r--r--      proto=rpc
          

          Here is the audit log for create call for file creation from testDecreaseReplicationBeforeCommitting:

          2016-09-29 11:20:20,403 [IPC Server handler 3 on 58161] INFO  FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7090)) - allowed=true       ugi=rushabhs (auth:SIMPLE)      ip=/127.0.0.1   cmd=create      src=/TestQuotaUpdate/testQuotaIssuesBeforeCommitting/4-1/testfile       dst=null        perm=rushabhs:supergroup:rw-r--r--      proto=rpc
          

          Only difference between 2 file paths is the initialReplication-finalReplication value.
          And we can't expect every developers in future to do the right thing while creating the file.
          I can point out at least 100's of file creations in other test suites that don't create a path that is unique to test case.
          That's why I think we should clear the namesystem state before running new test case.

          Show
          shahrs87 Rushabh S Shah added a comment - - edited the concern about namespace pollution is valid, but this is why the first line of each of the tests is to create a Path that is unique to the given test case and all subsequent operations occur under that Path. I agree that the issue mentioned in this jira is not due to namespace pollution. But this can cause test failures in future. Lets take for example: TestDiskspaceQuotaUpdate#testIncreaseReplicationBeforeCommitting and TestDiskspaceQuotaUpdate#testDecreaseReplicationBeforeCommitting Both of this test case calls testQuotaIssuesBeforeCommitting(short initialReplication,short finalReplication) to create the file. Here is the audit log for create call for file creation from testIncreaseReplicationBeforeCommitting: 2016-09-29 11:19:02,069 [IPC Server handler 2 on 58161] INFO FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7090)) - allowed=true ugi=rushabhs (auth:SIMPLE) ip=/127.0.0.1 cmd=create src=/TestQuotaUpdate/testQuotaIssuesBeforeCommitting/1-4/testfile dst=null perm=rushabhs:supergroup:rw-r--r-- proto=rpc Here is the audit log for create call for file creation from testDecreaseReplicationBeforeCommitting: 2016-09-29 11:20:20,403 [IPC Server handler 3 on 58161] INFO FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7090)) - allowed=true ugi=rushabhs (auth:SIMPLE) ip=/127.0.0.1 cmd=create src=/TestQuotaUpdate/testQuotaIssuesBeforeCommitting/4-1/testfile dst=null perm=rushabhs:supergroup:rw-r--r-- proto=rpc Only difference between 2 file paths is the initialReplication-finalReplication value. And we can't expect every developers in future to do the right thing while creating the file. I can point out at least 100's of file creations in other test suites that don't create a path that is unique to test case. That's why I think we should clear the namesystem state before running new test case.
          Hide
          xkrogen Erik Krogen added a comment -

          True, agreed that it is definitely easy to accidentally miss the requirement to make your test path unique. Clearing the Namesystem between tests certainly can't hurt, I would be in support of adding that.

          What about something like the following to try to prevent against both namespace pollution and the possibility of a failed test affecting subsequent tests:

            @Before
            public static void resetCluster() throws Exception {
              if (cluster.isClusterUp()) {
                // Clear namesystem to prevent pollution issues
                cluster.getNamesystem().clear();
              } else {
                // Previous test seems to have left cluster in a bad state;
                // recreate the cluster to protect subsequent tests
                cluster.shutdown();
                cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
                    .build();
                cluster.waitActive();
              }
            }
          
          Show
          xkrogen Erik Krogen added a comment - True, agreed that it is definitely easy to accidentally miss the requirement to make your test path unique. Clearing the Namesystem between tests certainly can't hurt, I would be in support of adding that. What about something like the following to try to prevent against both namespace pollution and the possibility of a failed test affecting subsequent tests: @Before public static void resetCluster() throws Exception { if (cluster.isClusterUp()) { // Clear namesystem to prevent pollution issues cluster.getNamesystem().clear(); } else { // Previous test seems to have left cluster in a bad state; // recreate the cluster to protect subsequent tests cluster.shutdown(); cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION) .build(); cluster.waitActive(); } }
          Hide
          ebadger Eric Badger added a comment -

          Looks good to me. Rushabh S Shah, do you have any concerns with Erik Krogen's approach? If not, I'll put up a patch so that we can get it committed.

          Show
          ebadger Eric Badger added a comment - Looks good to me. Rushabh S Shah , do you have any concerns with Erik Krogen 's approach? If not, I'll put up a patch so that we can get it committed.
          Hide
          shahrs87 Rushabh S Shah added a comment -

          Rushabh S Shah, do you have any concerns with Erik Krogen's approach?

          +1.

          Show
          shahrs87 Rushabh S Shah added a comment - Rushabh S Shah, do you have any concerns with Erik Krogen's approach? +1.
          Hide
          ebadger Eric Badger added a comment -

          Quick update: Adding the @Before method as Erik Krogen proposed above does not work. For a reason that isn't immediately obvious to me, cluster.getNamesystem().clear() breaks all of the tests. Even when run on only a single test in the test class, clearing the namesystem before the test completely breaks it. If I clear the namesystem in tandem with cluster.restartNameNodes() then all of the tests pass, but they take ~82 seconds to run on my machine. Restarting the cluster after every test, on the other hand, takes under 30 seconds total.

          However, by adding the else clause to the @Before method, I was able to remove all of the restartNameNodes() and restartDataNode() calls in the code and the tests still passed. So the only problem that we have is the issue that Rushabh S Shah raised about Namesystem pollution by bad tests. This test suite doesn't have any of those problems, so (unless I can figure out how to effectively clear the namesystem in a small amount of time without corrupting the tests) the question will be whether we want decreased runtime or to be resilient against tests written in the future that use the same file names.

          Show
          ebadger Eric Badger added a comment - Quick update: Adding the @Before method as Erik Krogen proposed above does not work. For a reason that isn't immediately obvious to me, cluster.getNamesystem().clear() breaks all of the tests. Even when run on only a single test in the test class, clearing the namesystem before the test completely breaks it. If I clear the namesystem in tandem with cluster.restartNameNodes() then all of the tests pass, but they take ~82 seconds to run on my machine. Restarting the cluster after every test, on the other hand, takes under 30 seconds total. However, by adding the else clause to the @Before method, I was able to remove all of the restartNameNodes() and restartDataNode() calls in the code and the tests still passed. So the only problem that we have is the issue that Rushabh S Shah raised about Namesystem pollution by bad tests. This test suite doesn't have any of those problems, so (unless I can figure out how to effectively clear the namesystem in a small amount of time without corrupting the tests) the question will be whether we want decreased runtime or to be resilient against tests written in the future that use the same file names.
          Hide
          shahrs87 Rushabh S Shah added a comment -

          So the only problem that we have is the issue that Rushabh S Shah raised about Namesystem pollution by bad tests.

          I looked at the code.

          FSNamesystem.java
          /**
             * Clear all loaded data
             */
            void clear() {
              dir.reset();
              dtSecretManager.reset();
              leaseManager.removeAllLeases();
              snapshotManager.clearSnapshottableDirs();
              cacheManager.clear();
              ecPolicyManager.clear();
              setImageLoaded(false);
              blockManager.clear();
            }
          

          setImageLoaded(false);

          This line from clear() method is the problem.

          FSDirectory.java
           void updateCount(INodesInPath iip, int numOfINodes,
                              QuotaCounts counts, boolean checkQuota)
                              throws QuotaExceededException {
              assert hasWriteLock();
              if (!namesystem.isImageLoaded()) {
                //still initializing. do not check or update quotas.
                return;
              }
              if (numOfINodes > iip.length()) {
                numOfINodes = iip.length();
              }
              if (checkQuota && !skipQuotaCheck) {
                verifyQuota(iip, numOfINodes, counts, null);
              }
              unprotectedUpdateCount(iip, numOfINodes, counts);
            }
          

          if (!namesystem.isImageLoaded()) {
          //still initializing. do not check or update quotas.
          return;
          }

          While creating a directory or file, it tries to update the usageCount.
          But since in the FSNamesystem#clear(), we set the imageLoaded flag to false, the FSDirectory#updateCount will return and FSDirectory#unprotectedUpdateCount wont be called and the usage count will not be updated.

          One thing we can do is:
          After calling FSNamesystem#clear in the @Before block, we can call cluster.getNamesystem().setImageLoaded(true);
          I just tried running one test case TestDiskspaceQuotaUpdate#testQuotaIssuesWhileCommitting and it passed.
          Hope this helps.

          Show
          shahrs87 Rushabh S Shah added a comment - So the only problem that we have is the issue that Rushabh S Shah raised about Namesystem pollution by bad tests. I looked at the code. FSNamesystem.java /** * Clear all loaded data */ void clear() { dir.reset(); dtSecretManager.reset(); leaseManager.removeAllLeases(); snapshotManager.clearSnapshottableDirs(); cacheManager.clear(); ecPolicyManager.clear(); setImageLoaded( false ); blockManager.clear(); } setImageLoaded(false); This line from clear() method is the problem. FSDirectory.java void updateCount(INodesInPath iip, int numOfINodes, QuotaCounts counts, boolean checkQuota) throws QuotaExceededException { assert hasWriteLock(); if (!namesystem.isImageLoaded()) { //still initializing. do not check or update quotas. return ; } if (numOfINodes > iip.length()) { numOfINodes = iip.length(); } if (checkQuota && !skipQuotaCheck) { verifyQuota(iip, numOfINodes, counts, null ); } unprotectedUpdateCount(iip, numOfINodes, counts); } if (!namesystem.isImageLoaded()) { //still initializing. do not check or update quotas. return; } While creating a directory or file, it tries to update the usageCount. But since in the FSNamesystem#clear() , we set the imageLoaded flag to false, the FSDirectory#updateCount will return and FSDirectory#unprotectedUpdateCount wont be called and the usage count will not be updated. One thing we can do is: After calling FSNamesystem#clear in the @Before block, we can call cluster.getNamesystem().setImageLoaded(true); I just tried running one test case TestDiskspaceQuotaUpdate#testQuotaIssuesWhileCommitting and it passed. Hope this helps.
          Hide
          xkrogen Erik Krogen added a comment - - edited

          Rushabh S Shah, I tried that previously, but when running the entire Test file in one run with that modification, the first two tests would pass and subsequent ones would fail... I was thinking it may have something to do with calling setImageLoaded causing some sort of inconsistent state in the Namesystem? The error I get is:

          java.lang.IllegalStateException: Cannot skip to less than the current value (=1073741826), where newValue=1073741825
          	at org.apache.hadoop.util.SequentialNumber.skipTo(SequentialNumber.java:58)
          	at org.apache.hadoop.hdfs.server.blockmanagement.BlockIdManager.setLastAllocatedContiguousBlockId(BlockIdManager.java:112)
          	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:806)
          	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:240)
          	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:149)
          	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838)
          	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:695)
          	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:291)
          	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1012)
          	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:666)
          	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:650)
          	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:712)
          	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:928)
          	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:907)
          	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1624)
          	at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2073)
          	at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNodes(MiniDFSCluster.java:2028)
          	at org.apache.hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate.testTruncateOverQuota(TestDiskspaceQuotaUpdate.java:347)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at java.lang.reflect.Method.invoke(Method.java:498)
          	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
          	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
          	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
          	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
          	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
          

          So it seems to me that it creates an inconsistency between the edit log and the image?

          Show
          xkrogen Erik Krogen added a comment - - edited Rushabh S Shah , I tried that previously, but when running the entire Test file in one run with that modification, the first two tests would pass and subsequent ones would fail... I was thinking it may have something to do with calling setImageLoaded causing some sort of inconsistent state in the Namesystem? The error I get is: java.lang.IllegalStateException: Cannot skip to less than the current value (=1073741826), where newValue=1073741825 at org.apache.hadoop.util.SequentialNumber.skipTo(SequentialNumber.java:58) at org.apache.hadoop.hdfs.server.blockmanagement.BlockIdManager.setLastAllocatedContiguousBlockId(BlockIdManager.java:112) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:806) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:240) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:149) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:695) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:291) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1012) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:666) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:650) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:712) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:928) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:907) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1624) at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2073) at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNodes(MiniDFSCluster.java:2028) at org.apache.hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate.testTruncateOverQuota(TestDiskspaceQuotaUpdate.java:347) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) So it seems to me that it creates an inconsistency between the edit log and the image?
          Hide
          ebadger Eric Badger added a comment -

          Erik Krogen, Rushabh S Shah, I still haven't figured out how to make it work with clearing the namesystem while also keeping the cluster up and healthy. Should we fix up the test so that it restarts the cluster if it isn't up, but do nothing in the case that it is already up? This will fix the problem that this Jira was initially created for. We can create then another Jira that relates to this one about fixing up the test(s) so that there is no namespace pollution, while also maintaining a small runtime. Since we already know how to fix 1 of the 2 problems, we might as well make that change while we're working on the solution to the other. Thoughts?

          Show
          ebadger Eric Badger added a comment - Erik Krogen , Rushabh S Shah , I still haven't figured out how to make it work with clearing the namesystem while also keeping the cluster up and healthy. Should we fix up the test so that it restarts the cluster if it isn't up, but do nothing in the case that it is already up? This will fix the problem that this Jira was initially created for. We can create then another Jira that relates to this one about fixing up the test(s) so that there is no namespace pollution, while also maintaining a small runtime. Since we already know how to fix 1 of the 2 problems, we might as well make that change while we're working on the solution to the other. Thoughts?
          Hide
          shahrs87 Rushabh S Shah added a comment -

          Should we fix up the test so that it restarts the cluster if it isn't up, but do nothing in the case that it is already up?

          +1. I am fine with this approach.

          Show
          shahrs87 Rushabh S Shah added a comment - Should we fix up the test so that it restarts the cluster if it isn't up, but do nothing in the case that it is already up? +1. I am fine with this approach.
          Hide
          xkrogen Erik Krogen added a comment -

          Sounds reasonable to me. Fixing the test failure should be prioritized but I definitely think more investigation is warranted since this could be a useful pattern for other tests as well.

          Show
          xkrogen Erik Krogen added a comment - Sounds reasonable to me. Fixing the test failure should be prioritized but I definitely think more investigation is warranted since this could be a useful pattern for other tests as well.
          Hide
          ebadger Eric Badger added a comment -

          Attaching a patch that simply adds the @Before annotation to start up a new cluster if the cluster has died. No other changes are made. Erik Krogen, Rushabh S Shah please review.

          Show
          ebadger Eric Badger added a comment - Attaching a patch that simply adds the @Before annotation to start up a new cluster if the cluster has died. No other changes are made. Erik Krogen , Rushabh S Shah please review.
          Hide
          xkrogen Erik Krogen added a comment - - edited

          Should we still change restartNameNodes to restartNameNode(true)? Correctness shouldn't be impacted either way because of the @Before block but it may help to reduce the number of times that the @Before block's condition hit. I'm assuming that doing the extra call to waitClusterUp as a result (inside of restartNameNode) will be faster than starting an entirely new cluster.
          LGTM either way though, just a minor detail.

          Show
          xkrogen Erik Krogen added a comment - - edited Should we still change restartNameNodes to restartNameNode(true) ? Correctness shouldn't be impacted either way because of the @Before block but it may help to reduce the number of times that the @Before block's condition hit. I'm assuming that doing the extra call to waitClusterUp as a result (inside of restartNameNode ) will be faster than starting an entirely new cluster. LGTM either way though, just a minor detail.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 22s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 6m 47s trunk passed
          +1 compile 0m 44s trunk passed
          +1 checkstyle 0m 25s trunk passed
          +1 mvnsite 0m 50s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 40s trunk passed
          +1 javadoc 0m 38s trunk passed
          +1 mvninstall 0m 45s the patch passed
          +1 compile 0m 42s the patch passed
          +1 javac 0m 42s the patch passed
          +1 checkstyle 0m 24s the patch passed
          +1 mvnsite 0m 49s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 45s the patch passed
          +1 javadoc 0m 37s the patch passed
          -1 unit 70m 18s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          88m 37s



          Reason Tests
          Failed junit tests hadoop.hdfs.TestSafeMode



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10921
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833159/HDFS-10921.003.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux f384b3ba99ac 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 1291254
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/17138/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17138/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17138/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 22s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 47s trunk passed +1 compile 0m 44s trunk passed +1 checkstyle 0m 25s trunk passed +1 mvnsite 0m 50s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 40s trunk passed +1 javadoc 0m 38s trunk passed +1 mvninstall 0m 45s the patch passed +1 compile 0m 42s the patch passed +1 javac 0m 42s the patch passed +1 checkstyle 0m 24s the patch passed +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 45s the patch passed +1 javadoc 0m 37s the patch passed -1 unit 70m 18s hadoop-hdfs in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 88m 37s Reason Tests Failed junit tests hadoop.hdfs.TestSafeMode Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10921 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833159/HDFS-10921.003.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux f384b3ba99ac 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 1291254 Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/17138/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17138/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17138/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          ebadger Eric Badger added a comment -

          Should we still change restartNameNodes to restartNameNode(true)

          Yes, good call. Attaching new patch with this added in.

          Show
          ebadger Eric Badger added a comment - Should we still change restartNameNodes to restartNameNode(true) Yes, good call. Attaching new patch with this added in.
          Hide
          ebadger Eric Badger added a comment -

          Not sure why the precommit didn't run. Cancelling and resubmitting patch.

          Show
          ebadger Eric Badger added a comment - Not sure why the precommit didn't run. Cancelling and resubmitting patch.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 19s trunk passed
          +1 compile 0m 47s trunk passed
          +1 checkstyle 0m 27s trunk passed
          +1 mvnsite 0m 58s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 41s trunk passed
          +1 javadoc 0m 44s trunk passed
          +1 mvninstall 0m 49s the patch passed
          +1 compile 0m 44s the patch passed
          +1 javac 0m 44s the patch passed
          +1 checkstyle 0m 24s the patch passed
          +1 mvnsite 0m 51s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 46s the patch passed
          +1 javadoc 0m 36s the patch passed
          -1 unit 96m 0s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          115m 25s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID
            hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
          Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10921
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833220/HDFS-10921.004.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 06ad0d8c9d7d 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / d26a1bb
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/17202/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17202/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17202/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 19s trunk passed +1 compile 0m 47s trunk passed +1 checkstyle 0m 27s trunk passed +1 mvnsite 0m 58s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 41s trunk passed +1 javadoc 0m 44s trunk passed +1 mvninstall 0m 49s the patch passed +1 compile 0m 44s the patch passed +1 javac 0m 44s the patch passed +1 checkstyle 0m 24s the patch passed +1 mvnsite 0m 51s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 46s the patch passed +1 javadoc 0m 36s the patch passed -1 unit 96m 0s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 115m 25s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID   hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2 Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10921 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833220/HDFS-10921.004.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 06ad0d8c9d7d 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d26a1bb Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/17202/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17202/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17202/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          ebadger Eric Badger added a comment -

          None of the precommit test failures fail for me locally, so I believe they are transients. Erik Krogen, do you have any other comments before committing?

          Show
          ebadger Eric Badger added a comment - None of the precommit test failures fail for me locally, so I believe they are transients. Erik Krogen , do you have any other comments before committing?
          Hide
          xkrogen Erik Krogen added a comment -

          LGTM, thanks Eric Badger!

          Show
          xkrogen Erik Krogen added a comment - LGTM, thanks Eric Badger !
          Hide
          ebadger Eric Badger added a comment -
          Show
          ebadger Eric Badger added a comment - Thanks, Erik Krogen and Rushabh S Shah !
          Hide
          liuml07 Mingliang Liu added a comment -

          +1 Will commit shortly.

          Show
          liuml07 Mingliang Liu added a comment - +1 Will commit shortly.
          Hide
          liuml07 Mingliang Liu added a comment -

          Generally I'd prefer a brand-new mini-cluster for each test as it's simpler and clearer. However, in this case the overhead of creating/destroying a cluster for each case is obvious and is a concern. The total runtime at my local test machine for individual vs. shared cluster is 58 seconds vs. 37 seconds.

          However, using a shared cluster we may have to deal with problems like this. I think the solution is pretty good. Thanks for the good discussion guys.

          Show
          liuml07 Mingliang Liu added a comment - Generally I'd prefer a brand-new mini-cluster for each test as it's simpler and clearer. However, in this case the overhead of creating/destroying a cluster for each case is obvious and is a concern. The total runtime at my local test machine for individual vs. shared cluster is 58 seconds vs. 37 seconds. However, using a shared cluster we may have to deal with problems like this. I think the solution is pretty good. Thanks for the good discussion guys.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10692 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10692/)
          HDFS-10921. TestDiskspaceQuotaUpdate doesn't wait for NN to get out of (liuml07: rev 55e1fb8e3221941321e6f5e04b334246c5f23027)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDiskspaceQuotaUpdate.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10692 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10692/ ) HDFS-10921 . TestDiskspaceQuotaUpdate doesn't wait for NN to get out of (liuml07: rev 55e1fb8e3221941321e6f5e04b334246c5f23027) (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDiskspaceQuotaUpdate.java

            People

            • Assignee:
              ebadger Eric Badger
              Reporter:
              ebadger Eric Badger
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development