Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9958

BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.2
    • Fix Version/s: 2.8.0, 2.7.3, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      In a scenario where the corrupt replica is on a failed storage, before it is taken out of blocksMap, there is a race which causes the creation of LocatedBlock on a machines array element that is not populated.

      Following is the root cause,

      final int numCorruptNodes = countNodes(blk).corruptReplicas();
      

      countNodes only looks at nodes with storage state as NORMAL, which in the case where corrupt replica is on failed storage will amount to numCorruptNodes being zero.

      final int numNodes = blocksMap.numNodes(blk);
      

      However, numNodes will count all nodes/storages irrespective of the state of the storage. Therefore numMachines will include such (failed) nodes. The assert would fail only if the system is enabled to catch Assertion errors, otherwise it goes ahead and tries to create LocatedBlock object for that is not put in the machines array.

      Here is the stack trace:

      java.lang.NullPointerException
      	at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
      	at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
      	at org.apache.hadoop.hdfs.protocol.LocatedBlock.<init>(LocatedBlock.java:84)
      	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
      	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
      	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
      	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
      	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
      	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
      

        Attachments

        1. HDFS-9958-Test-v1.txt
          4 kB
          Kuhu Shukla
        2. HDFS-9958.001.patch
          6 kB
          Kuhu Shukla
        3. HDFS-9958.002.patch
          7 kB
          Kuhu Shukla
        4. HDFS-9958.003.patch
          8 kB
          Kuhu Shukla
        5. HDFS-9958.004.patch
          10 kB
          Kuhu Shukla
        6. HDFS-9958.005.patch
          7 kB
          Kuhu Shukla
        7. HDFS-9958-branch-2.001.patch
          6 kB
          Kuhu Shukla
        8. HDFS-9958-branch-2.7.001.patch
          7 kB
          Kuhu Shukla

          Issue Links

            Activity

              People

              • Assignee:
                kshukla Kuhu Shukla
                Reporter:
                kshukla Kuhu Shukla
              • Votes:
                0 Vote for this issue
                Watchers:
                19 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: