Currently LeaseRenewer (and its daemon thread) without clients should be terminated after a grace period which defaults to 60 seconds. A race condition may happen when a new request is coming just after LeaseRenewer expired.
Reproduce this race condition:
- Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1 thread, after a few seconds, File#1 is closed , there is no clients in LeaseRenewer#1 now.
- 60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1 thread is still in sleep, Client#1 creates File#2, lead to the creation of Daemon#2.
- Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from factory.
- File#2 is closed after a few seconds, LeaseRenewer#2 is created since it can’t get renewer from factory.
Daemon#2 thread leaks from now on, since Client#1 in it can never be removed and it won't have a chance to stop.
To solve this problem, IIUIC, a simple way I think is to make sure that all clients are cleared when LeaseRenewer is removed from factory. Please feel free to give your suggestions. Thanks!
- is related to
HDFS-16235 Deadlock in LeaseRenewer for static remove method