Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13115

In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1.0, 2.10.0, 3.0.3
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      In LeaseManager,

       private synchronized INode[] getINodesWithLease() {
          List<INode> inodes = new ArrayList<>(leasesById.size());
          INode currentINode;
          for (long inodeId : leasesById.keySet()) {
            currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
            // A file with an active lease could get deleted, or its
            // parent directories could get recursively deleted.
            if (currentINode != null &&
                currentINode.isFile() &&
                !fsnamesystem.isFileDeleted(currentINode.asFile())) {
              inodes.add(currentINode);
            }
          }
          return inodes.toArray(new INode[0]);
        }
      

      we can see that given an inodeId, fsnamesystem.getFSDirectory().getInode(inodeId) could return NULL . The reason is explained in the comment.

      HDFS-12985 RCAed a case and solved that case, we saw that it fixes some cases, but we are still seeing NullPointerException from FSnamesystem

        public long getCompleteBlocksTotal() {
          // Calculate number of blocks under construction
          long numUCBlocks = 0;
          readLock();
          try {
            numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
            return getBlocksTotal() - numUCBlocks;
          } finally {
            readUnlock();
          }
        }
      

      The exception happens when the inode is removed for the given inodeid, see LeaseManager code below:

        synchronized long getNumUnderConstructionBlocks() {
          assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock wasn't"
            + "acquired before counting under construction blocks";
          long numUCBlocks = 0;
          for (Long id : getINodeIdWithLeases()) {
            final INodeFile cons = fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
            Preconditions.checkState(cons.isUnderConstruction());
            BlockInfo[] blocks = cons.getBlocks();
            if(blocks == null)
              continue;
            for(BlockInfo b : blocks) {
              if(!b.isComplete())
                numUCBlocks++;
            }
          }
          LOG.info("Number of blocks under construction: " + numUCBlocks);
          return numUCBlocks;
        }
      

      Create this jira to add a check whether the inode is removed, as a safeguard, to avoid the NullPointerException.

      Looks that after the inodeid is returned by getINodeIdWithLeases(), it got deleted from FSDirectory map.

      Ideally we should find out who deleted it, like in HDFS-12985.

      But it seems reasonable to me to have a safeguard here, like other code that calls to fsnamesystem.getFSDirectory().getInode(id) in the code base.

        Attachments

        1. HDFS-13115.001.patch
          2 kB
          Yongjun Zhang
        2. HDFS-13115.002.patch
          2 kB
          Yongjun Zhang

          Activity

            People

            • Assignee:
              yzhangal Yongjun Zhang
              Reporter:
              yzhangal Yongjun Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: