Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
2.0.0-alpha
-
None
-
None
Description
We saw TestDirectoryScanner fail during shutdown:
2012-08-09 12:17:19,844 WARN datanode.DataNode (BPServiceActor.java:run(683)) - Ending block pool service for: Block pool BP-610123021-172.29.121.238-1344539835759 (storage id DS-1581877160-172.29.121.238-43609-1344539837880) service to localhost/127.0.0.1:40012 ... 2012-08-09 12:17:19,876 FATAL blockmanagement.BlockManager (BlockManager.java:run(3039)) - ReplicationMonitor thread received Runtime exception. java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getBlockCollection(BlocksMap.java:101) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1141) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1116) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3070) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3032) at java.lang.Thread.run(Thread.java:662)
Inspecting the code, it appears that BlockManager#close -> BlocksMap#close can set blocks to null while computeDatanodeWork is running.
The fix seems simple – have close just set an exit flag, and have ReplicationMonitor#run call BlocksMap#close.
Attachments
Attachments
Issue Links
- duplicates
-
HDFS-3048 Small race in BlockManager#close
- Closed