Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
2.0.0-alpha
-
jdk1.6/java 1.7, centos6.4/debian6, 2.0.0-cdh4.5.0
Description
BlockPoolSliceScanner#scan contains a "while" loop that continues to verify (i.e. scan) blocks until the blockInfoSet is empty (or some other conditions like a timeout have occurred.) In order to do this, it calls BlockPoolSliceScanner#verifyFirstBlock. This is intended to grab the first block in the blockInfoSet, verify it, and remove it from that set. (blockInfoSet is sorted by last scan time.) Unfortunately, if we hit a certain bug in updateScanStatus, the block may never be removed from blockInfoSet. When this happens, we keep rescanning the exact same block until the timeout hits.
The bug is triggered when a block winds up in blockInfoSet but not in blockMap. You can see it clearly in this code:
private synchronized void updateScanStatus(Block block, ScanType type, boolean scanOk) { BlockScanInfo info = blockMap.get(block); if ( info != null ) { delBlockInfo(info); } else { // It might already be removed. Thats ok, it will be caught next time. info = new BlockScanInfo(block); }
If info == null, we never call delBlockInfo, the function which is intended to remove the blockInfoSet entry.
Luckily, there is a simple fix here... the variable that updateScanStatus is being passed is actually a BlockInfo object, so we can simply call delBlockInfo on it directly, without doing a lookup in the blockMap. This is both faster and more robust.