Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0-alpha1
-
None
-
Reviewed
Description
FsDatasetImpl#removeVolumes() operation crashes abruptly with IllegalMonitorStateException whenever the volume being removed is in use concurrently.
Looks like removeVolumes() is waiting on a monitor object "this" (that is FsDatasetImpl) which it has never locked, leading to IllegalMonitorStateException. This monitor wait happens only the volume being removed is in use (referencecount > 0). The thread performing this remove volume operation thus crashes abruptly and block invalidations for the remove volumes are totally skipped.
@Override public void removeVolumes(Set<File> volumesToRemove, boolean clearFailure) { .. .. try (AutoCloseableLock lock = datasetLock.acquire()) { <== LOCK acquire datasetLock for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) { .. .. .. asyncDiskService.removeVolume(sd.getCurrentDir()); <== volume SD1 remove volumes.removeVolume(absRoot, clearFailure); volumes.waitVolumeRemoved(5000, this); <== WAIT on "this" ?? But, we haven't locked it yet. This will cause IllegalMonitorStateException and crash getBlockReports()/FBR thread! for (String bpid : volumeMap.getBlockPoolList()) { List<ReplicaInfo> blocks = new ArrayList<>(); for (Iterator<ReplicaInfo> it = volumeMap.replicas(bpid).iterator(); it.hasNext(); ) { .. .. .. it.remove(); <== volumeMap removal } blkToInvalidate.put(bpid, blocks); } .. .. } <== LOCK release datasetLock // Call this outside the lock. for (Map.Entry<String, List<ReplicaInfo>> entry : blkToInvalidate.entrySet()) { .. for (ReplicaInfo block : blocks) { invalidate(bpid, block); <== Notify NN of Block removal } }
Attachments
Attachments
Issue Links
- is broken by
-
HDFS-10682 Replace FsDatasetImpl object lock with a separate lock object
- Resolved
- is depended upon by
-
HDFS-9781 FsDatasetImpl#getBlockReports can occasionally throw NullPointerException
- Resolved