Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14503

ThrottledAsyncChecker throws NPE during block pool initialization

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.3.0
    • None
    • None
    • None

    Description

      ThrottledAsyncChecker throws NPE during block pool initialization. The error leads the block pool registration failure.

      The exception

      2019-05-20 01:02:36,003 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected exception in block pool Block pool <registering> (Datanode Uuid xxxxx) service to xx.xx.xx.xx/xx.xx.xx.xx
      java.lang.NullPointerException
              at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$LastCheckResult.access$000(ThrottledAsyncChecker.java:211)
              at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker.schedule(ThrottledAsyncChecker.java:129)
              at org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker.checkAllVolumes(DatasetVolumeChecker.java:209)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3387)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1508)
              at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319)
              at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:272)
              at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:768)
              at java.lang.Thread.run(Thread.java:745)
      

      Looks like this error due to WeakHashMap type map completedChecks has removed the target entry while we still get that entry. Although we have done a check before we get it, there is still a chance the entry is got as null.

      We met a corner case for this: A federation mode, two block pools in DN, ThrottledAsyncChecker schedules two same health checks for same volume.

      2019-05-20 01:02:36,000 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for /hadoop/2/hdfs/data/current
      2019-05-20 01:02:36,000 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for /hadoop/2/hdfs/data/current
      

      completedChecks cleans up the entry for one successful check after called completedChecks#get. However, after this, another check we get the null.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              linyiqun Yiqun Lin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: