Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13465

Overlapping lease recoveries cause NPE in NN

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.8.0
    • None
    • namenode
    • None

    Description

      Overlapping lease recoveries for the same file will NPE in the DatanodeManager while creating LeaseRecoveryCommands, possibly losing other recovery commands.

      • client1 calls recoverLease, file is added to DN1's recovery queue
      • client2 calls recoverLease, file is added to DN2's recovery queue
      • one DN heartbeats, gets the block recovery command and it completes the synchronization before the other DN heartbeats; ie. file is closed.
      • other DN heartbeats, takes block from recovery queue, assumes it's still under construction, gets a NPE calling getExpectedLocations
      //check lease recovery
      BlockInfo[] blocks = nodeinfo.getLeaseRecoveryCommand(Integer.MAX_VALUE);
      if (blocks != null) {
        BlockRecoveryCommand brCommand = new BlockRecoveryCommand(
            blocks.length);
        for (BlockInfo b : blocks) {
          BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
          assert uc != null;
          final DatanodeStorageInfo[] storages = uc.getExpectedStorageLocations();
      

      This is "ok" to the NN state if only 1 block was queued.  All recoveries are lost if multiple blocks were queued.  Recovery will not occur until the client explicitly retries or the lease monitor recovers the lease.

      Attachments

        Activity

          People

            Unassigned Unassigned
            daryn Daryn Sharp
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: