Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
The directory scanner periodically compiles a report of differences between the datanode's on-disk and in-memory state.
The code to generate the reports is in DirectoryScanner#scan and DirectoryScanner#getDiskReport. It looks like the volume field in ScanInfo is not correctly initialized while compiling the diffs. This was not an issue before but now we depend on the volume information being present. The bug triggers the following NPE during a scan if a block is present in the Datanode's in-memory block map but missing on disk:
java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkAndUpdate(FsDatasetImpl.java:1404) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:416) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:365) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695)
Another NPE exposed in BlockManager#reportDiff
java.lang.NullPointerException: null
at java.util.TreeMap.getEntry(TreeMap.java:324)
at java.util.TreeMap.remove(TreeMap.java:580)
at java.util.TreeSet.remove(TreeSet.java:259)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1836)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:978)
at org.apache.hadoop.