Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10512

VolumeScanner may terminate due to NPE in DataNode.reportBadBlocks

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.4, 3.0.0-alpha1
    • Component/s: datanode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      VolumeScanner may terminate due to unexpected NullPointerException thrown in DataNode.reportBadBlocks(). This is different from HDFS-8850/HDFS-9190

      I observed this bug in a production CDH 5.5.1 cluster and the same bug still persist in upstream trunk.

      2016-04-07 20:30:53,830 WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad BP-1800173197-10.204.68.5-1444425156296:blk_1170134484_96468685 on /dfs/dn
      2016-04-07 20:30:53,831 ERROR org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception
      java.lang.NullPointerException
              at org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
              at org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
              at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
              at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
              at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
      2016-04-07 20:30:53,832 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting.
      

      I think the NPE comes from the volume variable in the following code snippet. Somehow the volume scanner know the volume, but the datanode can not lookup the volume using the block.

      public void reportBadBlocks(ExtendedBlock block) throws IOException{
          BPOfferService bpos = getBPOSForBlock(block);
          FsVolumeSpi volume = getFSDataset().getVolume(block);
          bpos.reportBadBlocks(
              block, volume.getStorageID(), volume.getStorageType());
        }
      

        Attachments

        1. HDFS-10512.001.patch
          0.9 kB
          Yiqun Lin
        2. HDFS-10512.002.patch
          2 kB
          Yiqun Lin
        3. HDFS-10512.004.patch
          4 kB
          Wei-Chiu Chuang
        4. HDFS-10512.005.patch
          7 kB
          Yiqun Lin
        5. HDFS-10512.006.patch
          7 kB
          Yiqun Lin

          Issue Links

            Activity

              People

              • Assignee:
                linyiqun Yiqun Lin
                Reporter:
                jojochuang Wei-Chiu Chuang
              • Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: