Details
Description
The NullPointerException in DN log as follows:
2020-12-28 15:49:25,453 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY //... 2020-12-28 15:51:25,551 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Connection timed out 2020-12-28 15:51:25,553 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to reconstruct striped block: BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036804064064_6311920695 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.clearBuffers(StripedWriter.java:299) at org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.clearBuffers(StripedBlockReconstructor.java:139) at org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:115) at org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2020-12-28 15:51:25,749 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036799445643_6313197139 src: /10.83.xxx.52:53198 dest: /10.83.xxx.52:50 010
NPE occurs at `writer.getTargetBuffer()` in codes:
// StripedWriter#clearBuffers void clearBuffers() { for (StripedBlockWriter writer : writers) { ByteBuffer targetBuffer = writer.getTargetBuffer(); if (targetBuffer != null) { targetBuffer.clear(); } } }
So, why is the writer null? Let's track when the writer is initialized and when reconstruct() is called, as follows:
// StripedBlockReconstructor#run public void run() { try { initDecoderIfNecessary(); getStripedReader().init(); stripedWriter.init(); //① reconstruct(); //② stripedWriter.endTargetBlocks(); } catch (Throwable e) { LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); // ...
They are called at ① and ② above respectively. `stripedWriter.init()` -> `initTargetStreams()`, as follows:
// StripedWriter#initTargetStreams int initTargetStreams() { int nSuccess = 0; for (short i = 0; i < targets.length; i++) { try { writers[i] = createWriter(i); nSuccess++; targetsStatus[i] = true; } catch (Throwable e) { LOG.warn(e.getMessage()); } } return nSuccess; }
NPE occurs when createWriter() gets an exception and 0 < nSuccess < targets.length.