[HDFS-4867] metaSave NPEs when there are invalid blocks in repl queue. - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.23.7, 2.0.4-alpha, 0.23.8
Fix Version/s: 2.1.0-beta, 0.23.9
Component/s: namenode
Labels:
None

Target Version/s:

2.0.5-alpha, 0.23.9
Hadoop Flags:

Reviewed

Description

Since metaSave cannot get the inode holding a orphaned/invalid block, it NPEs and stops generating further report. Normally ReplicationMonitor removes them quickly, but if the queue is huge, it takes very long time. Also in safe mode, they stay.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

testMetaSave.log
05/Jun/13 21:48
6 kB
Konstantin Shvachko
HDFS-4867.trunk.patch
03/Jun/13 20:57
4 kB
Plamen Jeliazkov
HDFS-4867.trunk.patch
04/Jun/13 18:18
5 kB
Plamen Jeliazkov
HDFS-4867.trunk.patch
06/Jun/13 07:32
6 kB
Ravi Prakash
HDFS-4867.trunk.patch
06/Jun/13 19:02
2 kB
Plamen Jeliazkov
HDFS-4867.branch-2.patch
06/Jun/13 19:02
1.0 kB
Plamen Jeliazkov
HDFS-4867.branch2.patch
05/Jun/13 18:58
4 kB
Plamen Jeliazkov
HDFS-4867.branch2.patch
05/Jun/13 20:33
4 kB
Plamen Jeliazkov
HDFS-4867.branch2.patch
06/Jun/13 07:31
6 kB
Ravi Prakash
HDFS-4867.branch-0.23.patch
05/Jun/13 18:50
5 kB
Ravi Prakash
HDFS-4867.branch-0.23.patch
06/Jun/13 07:31
6 kB
Ravi Prakash
HDFS-4867.branch-0.23.patch
06/Jun/13 19:02
2 kB
Plamen Jeliazkov
HDFS-4867.branch-0.23.patch
06/Jun/13 20:52
3 kB
Konstantin Shvachko

Issue Links

is duplicated by

HDFS-3974 dfsadmin -metasave throws NPE when under-replicated blocks are recently deleted

Open

relates to

HDFS-4832 Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

Closed

HDFS-4878 On Remove Block, Block is not Removed from neededReplications queue

Closed

Activity

Ascending order - Click to sort in descending order

Plamen Jeliazkov added a comment - 30/May/13 20:36

Kihwal, do you have any log snippets or stack traces by chance?

Plamen Jeliazkov added a comment - 30/May/13 20:36 Kihwal, do you have any log snippets or stack traces by chance?

Kihwal Lee added a comment - 30/May/13 21:49

Sorry I forgot to post. This is from branch-0.23. branch-2/trunk uses block collection, but it may end up with the same NPE.

java.io.IOException: java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.metaSave(BlockM
anager.java:352)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.metaSave(FSNamesystem.
java:614)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.metaSave(NameNode
RpcServer.java:671)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
pl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601) at
org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java
:394)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1571)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1567)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
java:1282)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1565)

Kihwal Lee added a comment - 30/May/13 21:49 Sorry I forgot to post. This is from branch-0.23. branch-2/trunk uses block collection, but it may end up with the same NPE. java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.metaSave(BlockM anager.java:352) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.metaSave(FSNamesystem. java:614) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.metaSave(NameNode RpcServer.java:671) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm pl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java :394) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1571) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1567) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. java:1282) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1565)

Plamen Jeliazkov added a comment - 30/May/13 23:29

I think I am seeing the case in branch-2; did your error look something like this?

2013-04-25 04:27:36,826 WARN org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.metaSave from 10.4.106.54:46567: error: java.lang.NullPointerException
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.dumpBlockMeta(BlockManager.java:459)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.metaSave(BlockManager.java:419)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.metaSave(FSNamesystem.java:1063)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.metaSave(FSNamesystem.java:1048)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.metaSave(NameNodeRpcServer.java:785)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.metaSave(ClientNamenodeProtocolServerSideTranslatorPB.java:640)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40788)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)

Also, are you able to successfully reproduce it by chance?

Plamen Jeliazkov added a comment - 30/May/13 23:29 I think I am seeing the case in branch-2; did your error look something like this? 2013-04-25 04:27:36,826 WARN org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.metaSave from 10.4.106.54:46567: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.dumpBlockMeta(BlockManager.java:459) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.metaSave(BlockManager.java:419) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.metaSave(FSNamesystem.java:1063) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.metaSave(FSNamesystem.java:1048) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.metaSave(NameNodeRpcServer.java:785) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.metaSave(ClientNamenodeProtocolServerSideTranslatorPB.java:640) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40788) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729) Also, are you able to successfully reproduce it by chance?

Kihwal Lee added a comment - 31/May/13 16:26

I think I am seeing the case in branch-2; did your error look something like this?

Yes, that is the same bug.

Also, are you able to successfully reproduce it by chance?

I first saw it happening in safe mode and then during a massive decommissiong. In the former, ReplicationMonitor is not processing neededReplication queue, so these blocks are not thrown away. In the latter case, it does run but couldn't get to those blocks in time, since it limits the number of blocks it processes in one iteration.

Detecting this condition is simple, but we need to think about what to do with it. May be it should throw them away like ReplicationMonitor would do, if running in a non-startup safemode. Outside safemode, it could just report, since ReplicationMonitor will eventually do the job.

Kihwal Lee added a comment - 31/May/13 16:26 I think I am seeing the case in branch-2; did your error look something like this? Yes, that is the same bug. Also, are you able to successfully reproduce it by chance? I first saw it happening in safe mode and then during a massive decommissiong. In the former, ReplicationMonitor is not processing neededReplication queue, so these blocks are not thrown away. In the latter case, it does run but couldn't get to those blocks in time, since it limits the number of blocks it processes in one iteration. Detecting this condition is simple, but we need to think about what to do with it. May be it should throw them away like ReplicationMonitor would do, if running in a non-startup safemode. Outside safemode, it could just report, since ReplicationMonitor will eventually do the job.

Todd Lipcon added a comment - 31/May/13 20:17

It seems to me that metasave should just be a "read" operation, and not modify the queue even if it detects that it's an invalid block. I'd vote for just logging it with an "[orphaned]" or something like that.

Todd Lipcon added a comment - 31/May/13 20:17 It seems to me that metasave should just be a "read" operation, and not modify the queue even if it detects that it's an invalid block. I'd vote for just logging it with an " [orphaned] " or something like that.

Plamen Jeliazkov added a comment - 02/Jun/13 17:02

I agree with Todd. metaSave should not modify the queue – abandoned blocks should be taken care of by the BlockManager, specifically the computeDatanodeWork method in the ReplicationMonitor. I can write up a patch to log the block as abandoned and a test for this. Ravi, have you started work already? Otherwise I'd like to take this up.

Plamen Jeliazkov added a comment - 02/Jun/13 17:02 I agree with Todd. metaSave should not modify the queue – abandoned blocks should be taken care of by the BlockManager, specifically the computeDatanodeWork method in the ReplicationMonitor. I can write up a patch to log the block as abandoned and a test for this. Ravi, have you started work already? Otherwise I'd like to take this up.

Plamen Jeliazkov added a comment - 03/Jun/13 20:57

Attaching patch with unit test to print orphaned blocks from metaSave. This will fix the immediate issue but I struggle to understand WHY this is happening in the first place...

I am able to simulate orphaned blocks in the unit test by deleting the created file immediately before metaSave is called.

Plamen Jeliazkov added a comment - 03/Jun/13 20:57 Attaching patch with unit test to print orphaned blocks from metaSave. This will fix the immediate issue but I struggle to understand WHY this is happening in the first place... I am able to simulate orphaned blocks in the unit test by deleting the created file immediately before metaSave is called.

Plamen Jeliazkov added a comment - 03/Jun/13 21:00

Ravi, I am going to take this issue up. If you would like to take it back please let me know and I will back off.

Plamen Jeliazkov added a comment - 03/Jun/13 21:00 Ravi, I am going to take this issue up. If you would like to take it back please let me know and I will back off.

Hadoop QA added a comment - 03/Jun/13 22:48

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12585941/HDFS-4867.trunk.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test files.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

+1 contrib tests. The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4470//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4470//console

This message is automatically generated.

Hadoop QA added a comment - 03/Jun/13 22:48 +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12585941/HDFS-4867.trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4470//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4470//console This message is automatically generated.

Ravi Prakash added a comment - 03/Jun/13 23:44

Hi Plamen, Please feel free to take this up.

Ravi Prakash added a comment - 03/Jun/13 23:44 Hi Plamen, Please feel free to take this up.

Konstantin Shvachko added a comment - 04/Jun/13 01:18

metaSave is probably a casualty here. Should we take a look at why orphaned / missing blocks are kept in replication queues in the first place?
It seems that when we delete a file blocks can also be removed from replication queue, because what is the point of replicating them if they don't belong to any files.

It still makes sense to have this case covered in metaSave().
The patch looks good. Couple of nits:

Could you remove 3 unused imports in the test.
Also it would be good to close BufferedReader in the end of both test cases.

Konstantin Shvachko added a comment - 04/Jun/13 01:18 metaSave is probably a casualty here. Should we take a look at why orphaned / missing blocks are kept in replication queues in the first place? It seems that when we delete a file blocks can also be removed from replication queue, because what is the point of replicating them if they don't belong to any files. It still makes sense to have this case covered in metaSave(). The patch looks good. Couple of nits: Could you remove 3 unused imports in the test. Also it would be good to close BufferedReader in the end of both test cases.

Plamen Jeliazkov added a comment - 04/Jun/13 18:16

New patch with Konstantin's comments taken up.

Plamen Jeliazkov added a comment - 04/Jun/13 18:16 New patch with Konstantin's comments taken up.

Hadoop QA added a comment - 04/Jun/13 20:09

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12586139/HDFS-4867.trunk.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test files.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

+1 contrib tests. The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4474//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4474//console

This message is automatically generated.

Hadoop QA added a comment - 04/Jun/13 20:09 +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12586139/HDFS-4867.trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4474//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4474//console This message is automatically generated.

Ravi Prakash added a comment - 04/Jun/13 22:34

Patch looks good to me. Thanks Plamen!

metaSave is probably a casualty here. Should we take a look at why orphaned / missing blocks are kept in replication queues in the first place?
It seems that when we delete a file blocks can also be removed from replication queue, because what is the point of replicating them if they don't belong to any files.

+1 for Konstantin's suggestion. Plamen, could you please open another JIRA for it?

Ravi Prakash added a comment - 04/Jun/13 22:34 Patch looks good to me. Thanks Plamen! metaSave is probably a casualty here. Should we take a look at why orphaned / missing blocks are kept in replication queues in the first place? It seems that when we delete a file blocks can also be removed from replication queue, because what is the point of replicating them if they don't belong to any files. +1 for Konstantin's suggestion. Plamen, could you please open another JIRA for it?

Ravi Prakash added a comment - 04/Jun/13 22:35

Plamen, could you please open another JIRA for it?

Seems like Tao did already. Thanks Tao!

Ravi Prakash added a comment - 04/Jun/13 22:35 Plamen, could you please open another JIRA for it? Seems like Tao did already. Thanks Tao!

Konstantin Shvachko added a comment - 04/Jun/13 23:54

+1 for the patch.

Konstantin Shvachko added a comment - 04/Jun/13 23:54 +1 for the patch.

Plamen Jeliazkov added a comment - 05/Jun/13 00:41

The patch for trunk is applicable to branch-2.

Plamen Jeliazkov added a comment - 05/Jun/13 00:41 The patch for trunk is applicable to branch-2.

Ravi Prakash added a comment - 05/Jun/13 18:50

Ported Plamen's patch to 0.23

Ravi Prakash added a comment - 05/Jun/13 18:50 Ported Plamen's patch to 0.23

Hadoop QA added a comment - 05/Jun/13 18:52

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12586364/HDFS-4867.branch-0.23.patch
against trunk revision .

-1 patch. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4482//console

This message is automatically generated.

Hadoop QA added a comment - 05/Jun/13 18:52 -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12586364/HDFS-4867.branch-0.23.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4482//console This message is automatically generated.

Plamen Jeliazkov added a comment - 05/Jun/13 18:58

Sorry; turns out there were differences with trunk after all. Attaching patch for branch-2.

Plamen Jeliazkov added a comment - 05/Jun/13 18:58 Sorry; turns out there were differences with trunk after all. Attaching patch for branch-2.

Konstantin Shvachko added a comment - 05/Jun/13 19:03

Cancelling patch to unconfuse Jenkins.

Konstantin Shvachko added a comment - 05/Jun/13 19:03 Cancelling patch to unconfuse Jenkins.

Hadoop QA added a comment - 05/Jun/13 19:05

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12586365/HDFS-4867.branch2.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test files.

-1 javac. The patch appears to cause the build to fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4483//console

This message is automatically generated.

Hadoop QA added a comment - 05/Jun/13 19:05 -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12586365/HDFS-4867.branch2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4483//console This message is automatically generated.

Konstantin Shvachko added a comment - 05/Jun/13 20:30

Guys, patches for branch-2 and branch-0.23 both fail on TestMetaSave.
Could you please take a look.

Konstantin Shvachko added a comment - 05/Jun/13 20:30 Guys, patches for branch-2 and branch-0.23 both fail on TestMetaSave. Could you please take a look.

Plamen Jeliazkov added a comment - 05/Jun/13 20:33

There are subtle differences in the metaSave logs that I missed between branch-2 and trunk. Fixed up my branch-2 patch.

Plamen Jeliazkov added a comment - 05/Jun/13 20:33 There are subtle differences in the metaSave logs that I missed between branch-2 and trunk. Fixed up my branch-2 patch.

Ravi Prakash added a comment - 05/Jun/13 21:01

Konstantin! I rechecked my patch and ran it several times. The test passes for me. Could you please clean your working directory and try again?

Ravi Prakash added a comment - 05/Jun/13 21:01 Konstantin! I rechecked my patch and ran it several times. The test passes for me. Could you please clean your working directory and try again?

Konstantin Shvachko added a comment - 05/Jun/13 21:48

Including the log from a run on branch-0.23, where I just added a message in failing asserts, which prints the line it is seeing.
Could be some timing issue?
I did clean everything.

Konstantin Shvachko added a comment - 05/Jun/13 21:48 Including the log from a run on branch-0.23, where I just added a message in failing asserts, which prints the line it is seeing. Could be some timing issue? I did clean everything.

Konstantin Shvachko added a comment - 05/Jun/13 22:10

OK I think it's the sequence in which the test cases are executed.
Better to fix it as it's the difference between Java 6 and 7.

Konstantin Shvachko added a comment - 05/Jun/13 22:10 OK I think it's the sequence in which the test cases are executed. Better to fix it as it's the difference between Java 6 and 7.

Ravi Prakash added a comment - 05/Jun/13 23:24

This may be because BeforeClass is initializing the cluster only once. In which case patches for trunk and 2.0 will have to be updated too. Let me check.

Ravi Prakash added a comment - 05/Jun/13 23:24 This may be because BeforeClass is initializing the cluster only once. In which case patches for trunk and 2.0 will have to be updated too. Let me check.

Ravi Prakash added a comment - 06/Jun/13 05:37

The test fails irrespective of the order in which they are run when they are run from a clean hadoop-project-hdfs/hadoop-hdfs. This is true for trunk as well as 0.23

Ravi Prakash added a comment - 06/Jun/13 05:37 The test fails irrespective of the order in which they are run when they are run from a clean hadoop-project-hdfs/hadoop-hdfs. This is true for trunk as well as 0.23

Ravi Prakash added a comment - 06/Jun/13 07:31

The problem was in orphaned metasave output files. I took the liberty of fixing and refactoring the tests a bit.
Konstantin, could you please review and commit?

Ravi Prakash added a comment - 06/Jun/13 07:31 The problem was in orphaned metasave output files. I took the liberty of fixing and refactoring the tests a bit. Konstantin, could you please review and commit?

Ravi Prakash added a comment - 06/Jun/13 07:32

And a delicious patch for Hadoop QA to munch

Ravi Prakash added a comment - 06/Jun/13 07:32 And a delicious patch for Hadoop QA to munch

Hadoop QA added a comment - 06/Jun/13 09:29

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12586461/HDFS-4867.trunk.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test files.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

+1 contrib tests. The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4484//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4484//console

This message is automatically generated.

Hadoop QA added a comment - 06/Jun/13 09:29 -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12586461/HDFS-4867.trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4484//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4484//console This message is automatically generated.

Konstantin Shvachko added a comment - 06/Jun/13 18:51

Checked the patches. They still fail, sorry. I think the problem is that orphaned blocks are still present and reported in metasave. That is if you run testMetaSaveWithOrphanedBlocks() first then testMetaSave() they fail, because the reports are different from what is expected.

So I propose to remove testMetaSaveWithOrphanedBlocks() for this patch. Because ~~HDFS-4878~~ will break it right away. So why don't we include this test to ~~HDFS-4878~~ instead. Should be modified for the output there. The order of running test cases after that shouldn't matter because there will be no orphaned blocks.

Konstantin Shvachko added a comment - 06/Jun/13 18:51 Checked the patches. They still fail, sorry. I think the problem is that orphaned blocks are still present and reported in metasave. That is if you run testMetaSaveWithOrphanedBlocks() first then testMetaSave() they fail, because the reports are different from what is expected. So I propose to remove testMetaSaveWithOrphanedBlocks() for this patch. Because HDFS-4878 will break it right away. So why don't we include this test to HDFS-4878 instead. Should be modified for the output there. The order of running test cases after that shouldn't matter because there will be no orphaned blocks.

Ravi Prakash added a comment - 06/Jun/13 18:59

So I propose to remove testMetaSaveWithOrphanedBlocks() for this patch.

Sure! That's fine by me

Ravi Prakash added a comment - 06/Jun/13 18:59 Ok So I propose to remove testMetaSaveWithOrphanedBlocks() for this patch. Sure! That's fine by me

Plamen Jeliazkov added a comment - 06/Jun/13 19:02

Attaching patches for all 3 branches that remove testMetaSaveWithOrphanedBlocks and remove unused imports.

Plamen Jeliazkov added a comment - 06/Jun/13 19:02 Attaching patches for all 3 branches that remove testMetaSaveWithOrphanedBlocks and remove unused imports.

Ravi Prakash added a comment - 06/Jun/13 20:34

Thanks Plamen! +1 All patches look good to me.

Ravi Prakash added a comment - 06/Jun/13 20:34 Thanks Plamen! +1 All patches look good to me.

Konstantin Shvachko added a comment - 06/Jun/13 20:52

I took liberty to reorder imports for 0.23 to bring it in sync with branch 2. That may save us time in the future.

Konstantin Shvachko added a comment - 06/Jun/13 20:52 I took liberty to reorder imports for 0.23 to bring it in sync with branch 2. That may save us time in the future.

Hudson added a comment - 06/Jun/13 20:55

Integrated in Hadoop-trunk-Commit #3875 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3875/)
~~HDFS-4867~~. metaSave NPEs when there are invalid blocks in repl queue. Contributed by Plamen Jeliazkov and Ravi Prakash. (Revision 1490433)

Result = SUCCESS
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1490433
Files :

/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestMetaSave.java

Hudson added a comment - 06/Jun/13 20:55 Integrated in Hadoop-trunk-Commit #3875 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3875/ ) HDFS-4867 . metaSave NPEs when there are invalid blocks in repl queue. Contributed by Plamen Jeliazkov and Ravi Prakash. (Revision 1490433) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1490433 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestMetaSave.java

Konstantin Shvachko added a comment - 06/Jun/13 20:58

I just committed this. Thank you Plamen and Ravi.

Konstantin Shvachko added a comment - 06/Jun/13 20:58 I just committed this. Thank you Plamen and Ravi.

Kihwal Lee added a comment - 07/Jun/13 01:28

shv: It looks like the change was added to the CHANGES.TXT in mapreduce, not hdfs in branch-0.23.

Kihwal Lee added a comment - 07/Jun/13 01:28 shv : It looks like the change was added to the CHANGES.TXT in mapreduce, not hdfs in branch-0.23.

Kihwal Lee added a comment - 07/Jun/13 20:22

It looks like the change was added to the CHANGES.TXT in mapreduce, not hdfs in branch-0.23.

Fixed it.

Kihwal Lee added a comment - 07/Jun/13 20:22 It looks like the change was added to the CHANGES.TXT in mapreduce, not hdfs in branch-0.23. Fixed it.

Konstantin Shvachko added a comment - 12/Jun/13 22:46

Oops, sorry.

Konstantin Shvachko added a comment - 12/Jun/13 22:46 Oops, sorry.

People

Assignee:: Plamen Jeliazkov

Reporter:: Kihwal Lee

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 30/May/13 18:32

Updated:: 27/Aug/13 22:07

Resolved:: 06/Jun/13 20:58

Hadoop HDFS

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates