Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4867

metaSave NPEs when there are invalid blocks in repl queue.

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.23.7, 2.0.4-alpha, 0.23.8
    • 2.1.0-beta, 0.23.9
    • namenode
    • None

    Description

      Since metaSave cannot get the inode holding a orphaned/invalid block, it NPEs and stops generating further report. Normally ReplicationMonitor removes them quickly, but if the queue is huge, it takes very long time. Also in safe mode, they stay.

      Attachments

        1. testMetaSave.log
          6 kB
          Konstantin Shvachko
        2. HDFS-4867.trunk.patch
          4 kB
          Plamen Jeliazkov
        3. HDFS-4867.trunk.patch
          5 kB
          Plamen Jeliazkov
        4. HDFS-4867.trunk.patch
          6 kB
          Ravi Prakash
        5. HDFS-4867.trunk.patch
          2 kB
          Plamen Jeliazkov
        6. HDFS-4867.branch-2.patch
          1.0 kB
          Plamen Jeliazkov
        7. HDFS-4867.branch2.patch
          4 kB
          Plamen Jeliazkov
        8. HDFS-4867.branch2.patch
          4 kB
          Plamen Jeliazkov
        9. HDFS-4867.branch2.patch
          6 kB
          Ravi Prakash
        10. HDFS-4867.branch-0.23.patch
          5 kB
          Ravi Prakash
        11. HDFS-4867.branch-0.23.patch
          6 kB
          Ravi Prakash
        12. HDFS-4867.branch-0.23.patch
          2 kB
          Plamen Jeliazkov
        13. HDFS-4867.branch-0.23.patch
          3 kB
          Konstantin Shvachko

        Issue Links

          Activity

            Kihwal, do you have any log snippets or stack traces by chance?

            zero45 Plamen Jeliazkov added a comment - Kihwal, do you have any log snippets or stack traces by chance?
            kihwal Kihwal Lee added a comment -

            Sorry I forgot to post. This is from branch-0.23. branch-2/trunk uses block collection, but it may end up with the same NPE.

            java.io.IOException: java.lang.NullPointerException
            at
            org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.metaSave(BlockM
            anager.java:352)
            at
            org.apache.hadoop.hdfs.server.namenode.FSNamesystem.metaSave(FSNamesystem.
            java:614)
            at
            org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.metaSave(NameNode
            RpcServer.java:671)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at
            sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
            57)
            at
            sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
            pl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:601) at
            org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java
            :394)
            at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1571)
            at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1567)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:415) at
            org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
            java:1282)
            at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1565)

            kihwal Kihwal Lee added a comment - Sorry I forgot to post. This is from branch-0.23. branch-2/trunk uses block collection, but it may end up with the same NPE. java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.metaSave(BlockM anager.java:352) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.metaSave(FSNamesystem. java:614) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.metaSave(NameNode RpcServer.java:671) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm pl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java :394) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1571) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1567) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. java:1282) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1565)

            I think I am seeing the case in branch-2; did your error look something like this?

            2013-04-25 04:27:36,826 WARN org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.metaSave from 10.4.106.54:46567: error: java.lang.NullPointerException
            java.lang.NullPointerException
                    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.dumpBlockMeta(BlockManager.java:459)
                    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.metaSave(BlockManager.java:419)
                    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.metaSave(FSNamesystem.java:1063)
                    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.metaSave(FSNamesystem.java:1048)
                    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.metaSave(NameNodeRpcServer.java:785)
                    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.metaSave(ClientNamenodeProtocolServerSideTranslatorPB.java:640)
                    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40788)
                    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
                    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
                    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
                    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
                    at java.security.AccessController.doPrivileged(Native Method)
                    at javax.security.auth.Subject.doAs(Subject.java:396)
                    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
                    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)
            

            Also, are you able to successfully reproduce it by chance?

            zero45 Plamen Jeliazkov added a comment - I think I am seeing the case in branch-2; did your error look something like this? 2013-04-25 04:27:36,826 WARN org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.metaSave from 10.4.106.54:46567: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.dumpBlockMeta(BlockManager.java:459) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.metaSave(BlockManager.java:419) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.metaSave(FSNamesystem.java:1063) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.metaSave(FSNamesystem.java:1048) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.metaSave(NameNodeRpcServer.java:785) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.metaSave(ClientNamenodeProtocolServerSideTranslatorPB.java:640) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40788) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729) Also, are you able to successfully reproduce it by chance?
            kihwal Kihwal Lee added a comment -

            I think I am seeing the case in branch-2; did your error look something like this?

            Yes, that is the same bug.

            Also, are you able to successfully reproduce it by chance?

            I first saw it happening in safe mode and then during a massive decommissiong. In the former, ReplicationMonitor is not processing neededReplication queue, so these blocks are not thrown away. In the latter case, it does run but couldn't get to those blocks in time, since it limits the number of blocks it processes in one iteration.

            Detecting this condition is simple, but we need to think about what to do with it. May be it should throw them away like ReplicationMonitor would do, if running in a non-startup safemode. Outside safemode, it could just report, since ReplicationMonitor will eventually do the job.

            kihwal Kihwal Lee added a comment - I think I am seeing the case in branch-2; did your error look something like this? Yes, that is the same bug. Also, are you able to successfully reproduce it by chance? I first saw it happening in safe mode and then during a massive decommissiong. In the former, ReplicationMonitor is not processing neededReplication queue, so these blocks are not thrown away. In the latter case, it does run but couldn't get to those blocks in time, since it limits the number of blocks it processes in one iteration. Detecting this condition is simple, but we need to think about what to do with it. May be it should throw them away like ReplicationMonitor would do, if running in a non-startup safemode. Outside safemode, it could just report, since ReplicationMonitor will eventually do the job.
            tlipcon Todd Lipcon added a comment -

            It seems to me that metasave should just be a "read" operation, and not modify the queue even if it detects that it's an invalid block. I'd vote for just logging it with an "[orphaned]" or something like that.

            tlipcon Todd Lipcon added a comment - It seems to me that metasave should just be a "read" operation, and not modify the queue even if it detects that it's an invalid block. I'd vote for just logging it with an " [orphaned] " or something like that.

            I agree with Todd. metaSave should not modify the queue – abandoned blocks should be taken care of by the BlockManager, specifically the computeDatanodeWork method in the ReplicationMonitor. I can write up a patch to log the block as abandoned and a test for this. Ravi, have you started work already? Otherwise I'd like to take this up.

            zero45 Plamen Jeliazkov added a comment - I agree with Todd. metaSave should not modify the queue – abandoned blocks should be taken care of by the BlockManager, specifically the computeDatanodeWork method in the ReplicationMonitor. I can write up a patch to log the block as abandoned and a test for this. Ravi, have you started work already? Otherwise I'd like to take this up.

            Attaching patch with unit test to print orphaned blocks from metaSave. This will fix the immediate issue but I struggle to understand WHY this is happening in the first place...

            I am able to simulate orphaned blocks in the unit test by deleting the created file immediately before metaSave is called.

            zero45 Plamen Jeliazkov added a comment - Attaching patch with unit test to print orphaned blocks from metaSave. This will fix the immediate issue but I struggle to understand WHY this is happening in the first place... I am able to simulate orphaned blocks in the unit test by deleting the created file immediately before metaSave is called.

            Ravi, I am going to take this issue up. If you would like to take it back please let me know and I will back off.

            zero45 Plamen Jeliazkov added a comment - Ravi, I am going to take this issue up. If you would like to take it back please let me know and I will back off.
            hadoopqa Hadoop QA added a comment -

            +1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12585941/HDFS-4867.trunk.patch
            against trunk revision .

            +1 @author. The patch does not contain any @author tags.

            +1 tests included. The patch appears to include 1 new or modified test files.

            +1 javac. The applied patch does not increase the total number of javac compiler warnings.

            +1 javadoc. The javadoc tool did not generate any warning messages.

            +1 eclipse:eclipse. The patch built with eclipse:eclipse.

            +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

            +1 release audit. The applied patch does not increase the total number of release audit warnings.

            +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

            +1 contrib tests. The patch passed contrib unit tests.

            Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4470//testReport/
            Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4470//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12585941/HDFS-4867.trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4470//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4470//console This message is automatically generated.
            raviprak Ravi Prakash added a comment -

            Hi Plamen, Please feel free to take this up.

            raviprak Ravi Prakash added a comment - Hi Plamen, Please feel free to take this up.

            metaSave is probably a casualty here. Should we take a look at why orphaned / missing blocks are kept in replication queues in the first place?
            It seems that when we delete a file blocks can also be removed from replication queue, because what is the point of replicating them if they don't belong to any files.

            It still makes sense to have this case covered in metaSave().
            The patch looks good. Couple of nits:

            1. Could you remove 3 unused imports in the test.
            2. Also it would be good to close BufferedReader in the end of both test cases.
            shv Konstantin Shvachko added a comment - metaSave is probably a casualty here. Should we take a look at why orphaned / missing blocks are kept in replication queues in the first place? It seems that when we delete a file blocks can also be removed from replication queue, because what is the point of replicating them if they don't belong to any files. It still makes sense to have this case covered in metaSave(). The patch looks good. Couple of nits: Could you remove 3 unused imports in the test. Also it would be good to close BufferedReader in the end of both test cases.

            New patch with Konstantin's comments taken up.

            zero45 Plamen Jeliazkov added a comment - New patch with Konstantin's comments taken up.
            hadoopqa Hadoop QA added a comment -

            +1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12586139/HDFS-4867.trunk.patch
            against trunk revision .

            +1 @author. The patch does not contain any @author tags.

            +1 tests included. The patch appears to include 1 new or modified test files.

            +1 javac. The applied patch does not increase the total number of javac compiler warnings.

            +1 javadoc. The javadoc tool did not generate any warning messages.

            +1 eclipse:eclipse. The patch built with eclipse:eclipse.

            +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

            +1 release audit. The applied patch does not increase the total number of release audit warnings.

            +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

            +1 contrib tests. The patch passed contrib unit tests.

            Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4474//testReport/
            Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4474//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12586139/HDFS-4867.trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4474//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4474//console This message is automatically generated.
            raviprak Ravi Prakash added a comment -

            Patch looks good to me. Thanks Plamen!

            metaSave is probably a casualty here. Should we take a look at why orphaned / missing blocks are kept in replication queues in the first place?
            It seems that when we delete a file blocks can also be removed from replication queue, because what is the point of replicating them if they don't belong to any files.

            +1 for Konstantin's suggestion. Plamen, could you please open another JIRA for it?

            raviprak Ravi Prakash added a comment - Patch looks good to me. Thanks Plamen! metaSave is probably a casualty here. Should we take a look at why orphaned / missing blocks are kept in replication queues in the first place? It seems that when we delete a file blocks can also be removed from replication queue, because what is the point of replicating them if they don't belong to any files. +1 for Konstantin's suggestion. Plamen, could you please open another JIRA for it?
            raviprak Ravi Prakash added a comment -

            Plamen, could you please open another JIRA for it?

            Seems like Tao did already. Thanks Tao!

            raviprak Ravi Prakash added a comment - Plamen, could you please open another JIRA for it? Seems like Tao did already. Thanks Tao!

            +1 for the patch.

            shv Konstantin Shvachko added a comment - +1 for the patch.

            The patch for trunk is applicable to branch-2.

            zero45 Plamen Jeliazkov added a comment - The patch for trunk is applicable to branch-2.
            raviprak Ravi Prakash added a comment -

            Ported Plamen's patch to 0.23

            raviprak Ravi Prakash added a comment - Ported Plamen's patch to 0.23
            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12586364/HDFS-4867.branch-0.23.patch
            against trunk revision .

            -1 patch. The patch command could not apply the patch.

            Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4482//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12586364/HDFS-4867.branch-0.23.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4482//console This message is automatically generated.

            Sorry; turns out there were differences with trunk after all. Attaching patch for branch-2.

            zero45 Plamen Jeliazkov added a comment - Sorry; turns out there were differences with trunk after all. Attaching patch for branch-2.

            Cancelling patch to unconfuse Jenkins.

            shv Konstantin Shvachko added a comment - Cancelling patch to unconfuse Jenkins.
            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12586365/HDFS-4867.branch2.patch
            against trunk revision .

            +1 @author. The patch does not contain any @author tags.

            +1 tests included. The patch appears to include 1 new or modified test files.

            -1 javac. The patch appears to cause the build to fail.

            Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4483//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12586365/HDFS-4867.branch2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4483//console This message is automatically generated.

            Guys, patches for branch-2 and branch-0.23 both fail on TestMetaSave.
            Could you please take a look.

            shv Konstantin Shvachko added a comment - Guys, patches for branch-2 and branch-0.23 both fail on TestMetaSave. Could you please take a look.

            There are subtle differences in the metaSave logs that I missed between branch-2 and trunk. Fixed up my branch-2 patch.

            zero45 Plamen Jeliazkov added a comment - There are subtle differences in the metaSave logs that I missed between branch-2 and trunk. Fixed up my branch-2 patch.
            raviprak Ravi Prakash added a comment -

            Konstantin! I rechecked my patch and ran it several times. The test passes for me. Could you please clean your working directory and try again?

            raviprak Ravi Prakash added a comment - Konstantin! I rechecked my patch and ran it several times. The test passes for me. Could you please clean your working directory and try again?

            Including the log from a run on branch-0.23, where I just added a message in failing asserts, which prints the line it is seeing.
            Could be some timing issue?
            I did clean everything.

            shv Konstantin Shvachko added a comment - Including the log from a run on branch-0.23, where I just added a message in failing asserts, which prints the line it is seeing. Could be some timing issue? I did clean everything.

            OK I think it's the sequence in which the test cases are executed.
            Better to fix it as it's the difference between Java 6 and 7.

            shv Konstantin Shvachko added a comment - OK I think it's the sequence in which the test cases are executed. Better to fix it as it's the difference between Java 6 and 7.
            raviprak Ravi Prakash added a comment -

            This may be because BeforeClass is initializing the cluster only once. In which case patches for trunk and 2.0 will have to be updated too. Let me check.

            raviprak Ravi Prakash added a comment - This may be because BeforeClass is initializing the cluster only once. In which case patches for trunk and 2.0 will have to be updated too. Let me check.
            raviprak Ravi Prakash added a comment -

            The test fails irrespective of the order in which they are run when they are run from a clean hadoop-project-hdfs/hadoop-hdfs. This is true for trunk as well as 0.23

            raviprak Ravi Prakash added a comment - The test fails irrespective of the order in which they are run when they are run from a clean hadoop-project-hdfs/hadoop-hdfs. This is true for trunk as well as 0.23
            raviprak Ravi Prakash added a comment -

            The problem was in orphaned metasave output files. I took the liberty of fixing and refactoring the tests a bit.
            Konstantin, could you please review and commit?

            raviprak Ravi Prakash added a comment - The problem was in orphaned metasave output files. I took the liberty of fixing and refactoring the tests a bit. Konstantin, could you please review and commit?
            raviprak Ravi Prakash added a comment -

            And a delicious patch for Hadoop QA to munch

            raviprak Ravi Prakash added a comment - And a delicious patch for Hadoop QA to munch
            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12586461/HDFS-4867.trunk.patch
            against trunk revision .

            +1 @author. The patch does not contain any @author tags.

            +1 tests included. The patch appears to include 1 new or modified test files.

            +1 javac. The applied patch does not increase the total number of javac compiler warnings.

            +1 javadoc. The javadoc tool did not generate any warning messages.

            +1 eclipse:eclipse. The patch built with eclipse:eclipse.

            +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

            +1 release audit. The applied patch does not increase the total number of release audit warnings.

            -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

            org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

            +1 contrib tests. The patch passed contrib unit tests.

            Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4484//testReport/
            Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4484//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12586461/HDFS-4867.trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4484//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4484//console This message is automatically generated.

            Checked the patches. They still fail, sorry. I think the problem is that orphaned blocks are still present and reported in metasave. That is if you run testMetaSaveWithOrphanedBlocks() first then testMetaSave() they fail, because the reports are different from what is expected.

            So I propose to remove testMetaSaveWithOrphanedBlocks() for this patch. Because HDFS-4878 will break it right away. So why don't we include this test to HDFS-4878 instead. Should be modified for the output there. The order of running test cases after that shouldn't matter because there will be no orphaned blocks.

            shv Konstantin Shvachko added a comment - Checked the patches. They still fail, sorry. I think the problem is that orphaned blocks are still present and reported in metasave. That is if you run testMetaSaveWithOrphanedBlocks() first then testMetaSave() they fail, because the reports are different from what is expected. So I propose to remove testMetaSaveWithOrphanedBlocks() for this patch. Because HDFS-4878 will break it right away. So why don't we include this test to HDFS-4878 instead. Should be modified for the output there. The order of running test cases after that shouldn't matter because there will be no orphaned blocks.
            raviprak Ravi Prakash added a comment -

            Ok

            So I propose to remove testMetaSaveWithOrphanedBlocks() for this patch.

            Sure! That's fine by me

            raviprak Ravi Prakash added a comment - Ok So I propose to remove testMetaSaveWithOrphanedBlocks() for this patch. Sure! That's fine by me

            Attaching patches for all 3 branches that remove testMetaSaveWithOrphanedBlocks and remove unused imports.

            zero45 Plamen Jeliazkov added a comment - Attaching patches for all 3 branches that remove testMetaSaveWithOrphanedBlocks and remove unused imports.
            raviprak Ravi Prakash added a comment -

            Thanks Plamen! +1 All patches look good to me.

            raviprak Ravi Prakash added a comment - Thanks Plamen! +1 All patches look good to me.

            I took liberty to reorder imports for 0.23 to bring it in sync with branch 2. That may save us time in the future.

            shv Konstantin Shvachko added a comment - I took liberty to reorder imports for 0.23 to bring it in sync with branch 2. That may save us time in the future.
            hudson Hudson added a comment -

            Integrated in Hadoop-trunk-Commit #3875 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3875/)
            HDFS-4867. metaSave NPEs when there are invalid blocks in repl queue. Contributed by Plamen Jeliazkov and Ravi Prakash. (Revision 1490433)

            Result = SUCCESS
            shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1490433
            Files :

            • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
            • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
            • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestMetaSave.java
            hudson Hudson added a comment - Integrated in Hadoop-trunk-Commit #3875 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3875/ ) HDFS-4867 . metaSave NPEs when there are invalid blocks in repl queue. Contributed by Plamen Jeliazkov and Ravi Prakash. (Revision 1490433) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1490433 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestMetaSave.java

            I just committed this. Thank you Plamen and Ravi.

            shv Konstantin Shvachko added a comment - I just committed this. Thank you Plamen and Ravi.
            kihwal Kihwal Lee added a comment -

            shv: It looks like the change was added to the CHANGES.TXT in mapreduce, not hdfs in branch-0.23.

            kihwal Kihwal Lee added a comment - shv : It looks like the change was added to the CHANGES.TXT in mapreduce, not hdfs in branch-0.23.
            kihwal Kihwal Lee added a comment -

            It looks like the change was added to the CHANGES.TXT in mapreduce, not hdfs in branch-0.23.

            Fixed it.

            kihwal Kihwal Lee added a comment - It looks like the change was added to the CHANGES.TXT in mapreduce, not hdfs in branch-0.23. Fixed it.

            Oops, sorry.

            shv Konstantin Shvachko added a comment - Oops, sorry.

            People

              zero45 Plamen Jeliazkov
              kihwal Kihwal Lee
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: