Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16855

Remove the redundant write lock in addBlockPool

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • datanode

    Description

      When patching the datanode's fine-grained lock, we found that the datanode couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.

      // getspaceused classname
        <property>
          <name>fs.getspaceused.classname</name>
          <value>org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed</value>
        </property> 
      // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool 
      // get writeLock
      @Override
      public void addBlockPool(String bpid, Configuration conf)
          throws IOException {
        LOG.info("Adding block pool " + bpid);
        AddBlockPoolException volumeExceptions = new AddBlockPoolException();
        try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, bpid)) {
          try {
            volumes.addBlockPool(bpid, conf);
          } catch (AddBlockPoolException e) {
            volumeExceptions.mergeException(e);
          }
          volumeMap.initBlockPool(bpid);
          Set<String> vols = storageMap.keySet();
          for (String v : vols) {
            lockManager.addLock(LockLevel.VOLUME, bpid, v);
          }
        }
       
      } 
      // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
      // need readLock
      void replicas(String bpid, Consumer<Iterator<ReplicaInfo>> consumer) {
        LightWeightResizableGSet<Block, ReplicaInfo> m = null;
        try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, bpid)) {
          m = map.get(bpid);
          if (m !=null) {
            m.getIterator(consumer);
          }
        }
      } 

       

      because it is not the same thread, so the write lock cannot be downgraded to a read lock

      
      void addBlockPool(final String bpid, final Configuration conf) throws IOException {
        long totalStartTime = Time.monotonicNow();
        final Map<FsVolumeSpi, IOException> unhealthyDataDirs =
            new ConcurrentHashMap<FsVolumeSpi, IOException>();
        List<Thread> blockPoolAddingThreads = new ArrayList<Thread>();
        for (final FsVolumeImpl v : volumes) {
          Thread t = new Thread() {
            public void run() {
              try (FsVolumeReference ref = v.obtainReference()) {
                FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
                    " on volume " + v + "...");
                long startTime = Time.monotonicNow();
                v.addBlockPool(bpid, conf);
                long timeTaken = Time.monotonicNow() - startTime;
                FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
                    " on " + v + ": " + timeTaken + "ms");
              } catch (IOException ioe) {
                FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
                    ". Will throw later.", ioe);
                unhealthyDataDirs.put(v, ioe);
              }
            }
          };
          blockPoolAddingThreads.add(t);
          t.start();
        }
        for (Thread t : blockPoolAddingThreads) {
          try {
            t.join();
          } catch (InterruptedException ie) {
            throw new IOException(ie);
          }
        }
      } 

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dingshun dingshun
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: