Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
When patching the datanode's fine-grained lock, we found that the datanode couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
// getspaceused classname
<property>
<name>fs.getspaceused.classname</name>
<value>org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed</value>
</property>
// org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool // get writeLock @Override public void addBlockPool(String bpid, Configuration conf) throws IOException { LOG.info("Adding block pool " + bpid); AddBlockPoolException volumeExceptions = new AddBlockPoolException(); try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, bpid)) { try { volumes.addBlockPool(bpid, conf); } catch (AddBlockPoolException e) { volumeExceptions.mergeException(e); } volumeMap.initBlockPool(bpid); Set<String> vols = storageMap.keySet(); for (String v : vols) { lockManager.addLock(LockLevel.VOLUME, bpid, v); } } }
// org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica // need readLock void replicas(String bpid, Consumer<Iterator<ReplicaInfo>> consumer) { LightWeightResizableGSet<Block, ReplicaInfo> m = null; try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, bpid)) { m = map.get(bpid); if (m !=null) { m.getIterator(consumer); } } }
because it is not the same thread, so the write lock cannot be downgraded to a read lock
void addBlockPool(final String bpid, final Configuration conf) throws IOException { long totalStartTime = Time.monotonicNow(); final Map<FsVolumeSpi, IOException> unhealthyDataDirs = new ConcurrentHashMap<FsVolumeSpi, IOException>(); List<Thread> blockPoolAddingThreads = new ArrayList<Thread>(); for (final FsVolumeImpl v : volumes) { Thread t = new Thread() { public void run() { try (FsVolumeReference ref = v.obtainReference()) { FsDatasetImpl.LOG.info("Scanning block pool " + bpid + " on volume " + v + "..."); long startTime = Time.monotonicNow(); v.addBlockPool(bpid, conf); long timeTaken = Time.monotonicNow() - startTime; FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + " on " + v + ": " + timeTaken + "ms"); } catch (IOException ioe) { FsDatasetImpl.LOG.info("Caught exception while scanning " + v + ". Will throw later.", ioe); unhealthyDataDirs.put(v, ioe); } } }; blockPoolAddingThreads.add(t); t.start(); } for (Thread t : blockPoolAddingThreads) { try { t.join(); } catch (InterruptedException ie) { throw new IOException(ie); } } }
Attachments
Issue Links
- is related to
-
HDFS-16534 Split datanode block pool locks to volume grain.
- Resolved
-
HDFS-15382 Split one FsDatasetImpl lock to volume grain locks.
- Resolved
- links to