Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10830

FsDatasetImpl#removeVolumes crashes with IllegalMonitorStateException when vol being removed is in use

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha1
    • 2.8.0, 3.0.0-alpha2
    • hdfs
    • None
    • Reviewed

    Description

      FsDatasetImpl#removeVolumes() operation crashes abruptly with IllegalMonitorStateException whenever the volume being removed is in use concurrently.

      Looks like removeVolumes() is waiting on a monitor object "this" (that is FsDatasetImpl) which it has never locked, leading to IllegalMonitorStateException. This monitor wait happens only the volume being removed is in use (referencecount > 0). The thread performing this remove volume operation thus crashes abruptly and block invalidations for the remove volumes are totally skipped.

      FsDatasetImpl.java
      @Override
      public void removeVolumes(Set<File> volumesToRemove, boolean clearFailure) {
      ..
      ..
      try (AutoCloseableLock lock = datasetLock.acquire()) {   <== LOCK acquire datasetLock
      for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) {
        .. .. ..
        asyncDiskService.removeVolume(sd.getCurrentDir());     <== volume SD1 remove
        volumes.removeVolume(absRoot, clearFailure);
        volumes.waitVolumeRemoved(5000, this);                 <== WAIT on "this" ?? But, we haven't locked it yet.
                                                                   This will cause IllegalMonitorStateException
                                                                   and crash getBlockReports()/FBR thread!
      
        for (String bpid : volumeMap.getBlockPoolList()) {
          List<ReplicaInfo> blocks = new ArrayList<>();
          for (Iterator<ReplicaInfo> it = volumeMap.replicas(bpid).iterator();
               it.hasNext(); ) {
              .. .. .. 
              it.remove();                                     <== volumeMap removal
            }
          blkToInvalidate.put(bpid, blocks);
        }
       .. ..
      }                                                        <== LOCK release datasetLock   
      
      // Call this outside the lock.
      for (Map.Entry<String, List<ReplicaInfo>> entry :
      blkToInvalidate.entrySet()) {
       ..
       for (ReplicaInfo block : blocks) {
        invalidate(bpid, block);                               <== Notify NN of Block removal
       }
      }
      

      Attachments

        1. HDFS-10830.01.patch
          6 kB
          Arpit Agarwal
        2. HDFS-10830.02.patch
          6 kB
          Arpit Agarwal
        3. HDFS-10830.05.patch
          8 kB
          Arpit Agarwal
        4. HDFS-10830.06.patch
          8 kB
          Arpit Agarwal

        Issue Links

          Activity

            People

              arp Arpit Agarwal
              manojg Manoj Govindassamy
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: