Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
-
None
-
None
Description
Currently on the client side if the DelegationTokenRenewer attempts to renew a WebHdfs delegation token while the client system is shutting down (i.e. FileSystem.Cache.ClientFinalizer is running) a deadlock may occur. This happens because ClientFinalizer calls FileSystem.Cache.closeAll() which first takes a lock on the FileSystem.Cache object and then locks each file system in the cache as it iterates over them. DelegationTokenRenewer takes a lock on a filesystem object while it is renewing that filesystem's token, but within TokenAspect.TokenManager.renew() (used for renewal of WebHdfs tokens) FileSystem.get is called, which in turn takes a lock on the FileSystem cache object, potentially causing deadlock if ClientFinalizer is currently running.
See below for example deadlock output:
Found one Java-level deadlock: ============================= "Thread-8572": waiting to lock monitor 0x00007eff401f9878 (object 0x000000051ec3f930, a dali.hdfs.web.WebHdfsFileSystem), which is held by "FileSystem-DelegationTokenRenewer" "FileSystem-DelegationTokenRenewer": waiting to lock monitor 0x00007f005c08f5c8 (object 0x000000050389c8b8, a dali.fs.FileSystem$Cache), which is held by "Thread-8572" Java stack information for the threads listed above: =================================================== "Thread-8572": at dali.hdfs.web.WebHdfsFileSystem.close(WebHdfsFileSystem.java:864) - waiting to lock <0x000000051ec3f930> (a dali.hdfs.web.WebHdfsFileSystem) at dali.fs.FilterFileSystem.close(FilterFileSystem.java:449) at dali.fs.FileSystem$Cache.closeAll(FileSystem.java:2407) - locked <0x000000050389c8b8> (a dali.fs.FileSystem$Cache) at dali.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2424) - locked <0x000000050389c8d0> (a dali.fs.FileSystem$Cache$ClientFinalizer) at dali.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) "FileSystem-DelegationTokenRenewer": at dali.fs.FileSystem$Cache.getInternal(FileSystem.java:2343) - waiting to lock <0x000000050389c8b8> (a dali.fs.FileSystem$Cache) at dali.fs.FileSystem$Cache.get(FileSystem.java:2332) at dali.fs.FileSystem.get(FileSystem.java:369) at dali.hdfs.web.TokenAspect$TokenManager.getInstance(TokenAspect.java:92) at dali.hdfs.web.TokenAspect$TokenManager.renew(TokenAspect.java:72) at dali.security.token.Token.renew(Token.java:373) at dali.fs.DelegationTokenRenewer$RenewAction.renew(DelegationTokenRenewer.java:127) - locked <0x000000051ec3f930> (a dali.hdfs.web.WebHdfsFileSystem) at dali.fs.DelegationTokenRenewer$RenewAction.access$300(DelegationTokenRenewer.java:57) at dali.fs.DelegationTokenRenewer.run(DelegationTokenRenewer.java:258) Found 1 deadlock.