Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
VolumeScanner may terminate due to unexpected NullPointerException thrown in DataNode.reportBadBlocks(). This is different from HDFS-8850/HDFS-9190
I observed this bug in a production CDH 5.5.1 cluster and the same bug still persist in upstream trunk.
2016-04-07 20:30:53,830 WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad BP-1800173197-10.204.68.5-1444425156296:blk_1170134484_96468685 on /dfs/dn 2016-04-07 20:30:53,831 ERROR org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018) at org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287) at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443) at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547) at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621) 2016-04-07 20:30:53,832 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting.
I think the NPE comes from the volume variable in the following code snippet. Somehow the volume scanner know the volume, but the datanode can not lookup the volume using the block.
public void reportBadBlocks(ExtendedBlock block) throws IOException{ BPOfferService bpos = getBPOSForBlock(block); FsVolumeSpi volume = getFSDataset().getVolume(block); bpos.reportBadBlocks( block, volume.getStorageID(), volume.getStorageType()); }
Attachments
Attachments
Issue Links
- is related to
-
HDFS-11070 NPE in BlockSender due to race condition
- In Progress
-
HDFS-10587 Incorrect offset/length calculation in pipeline recovery causes block corruption
- Resolved
- relates to
-
HDFS-10625 VolumeScanner to report why a block is found bad
- Resolved