Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14503

ThrottledAsyncChecker throws NPE during block pool initialization

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 3.3.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      ThrottledAsyncChecker throws NPE during block pool initialization. The error leads the block pool registration failure.

      The exception

      2019-05-20 01:02:36,003 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected exception in block pool Block pool <registering> (Datanode Uuid xxxxx) service to xx.xx.xx.xx/xx.xx.xx.xx
      java.lang.NullPointerException
              at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$LastCheckResult.access$000(ThrottledAsyncChecker.java:211)
              at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker.schedule(ThrottledAsyncChecker.java:129)
              at org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker.checkAllVolumes(DatasetVolumeChecker.java:209)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3387)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1508)
              at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319)
              at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:272)
              at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:768)
              at java.lang.Thread.run(Thread.java:745)
      

      Looks like this error due to WeakHashMap type map completedChecks has removed the target entry while we still get that entry. Although we have done a check before we get it, there is still a chance the entry is got as null.

      We met a corner case for this: A federation mode, two block pools in DN, ThrottledAsyncChecker schedules two same health checks for same volume.

      2019-05-20 01:02:36,000 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for /hadoop/2/hdfs/data/current
      2019-05-20 01:02:36,000 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for /hadoop/2/hdfs/data/current
      

      completedChecks cleans up the entry for one successful check after called completedChecks#get. However, after this, another check we get the null.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                linyiqun Yiqun Lin
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: