Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11208

Deadlock in WebHDFS on shutdown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
    • None
    • webhdfs
    • None

    Description

      Currently on the client side if the DelegationTokenRenewer attempts to renew a WebHdfs delegation token while the client system is shutting down (i.e. FileSystem.Cache.ClientFinalizer is running) a deadlock may occur. This happens because ClientFinalizer calls FileSystem.Cache.closeAll() which first takes a lock on the FileSystem.Cache object and then locks each file system in the cache as it iterates over them. DelegationTokenRenewer takes a lock on a filesystem object while it is renewing that filesystem's token, but within TokenAspect.TokenManager.renew() (used for renewal of WebHdfs tokens) FileSystem.get is called, which in turn takes a lock on the FileSystem cache object, potentially causing deadlock if ClientFinalizer is currently running.

      See below for example deadlock output:

      Found one Java-level deadlock:
      =============================
      "Thread-8572":
      waiting to lock monitor 0x00007eff401f9878 (object 0x000000051ec3f930, a
      dali.hdfs.web.WebHdfsFileSystem),
      which is held by "FileSystem-DelegationTokenRenewer"
      "FileSystem-DelegationTokenRenewer":
      waiting to lock monitor 0x00007f005c08f5c8 (object 0x000000050389c8b8, a
      dali.fs.FileSystem$Cache),
      which is held by "Thread-8572"
      
      Java stack information for the threads listed above:
      ===================================================
      "Thread-8572":
      at dali.hdfs.web.WebHdfsFileSystem.close(WebHdfsFileSystem.java:864)
      
         - waiting to lock <0x000000051ec3f930> (a
         dali.hdfs.web.WebHdfsFileSystem)
         at dali.fs.FilterFileSystem.close(FilterFileSystem.java:449)
         at dali.fs.FileSystem$Cache.closeAll(FileSystem.java:2407)
         - locked <0x000000050389c8b8> (a dali.fs.FileSystem$Cache)
         at dali.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2424)
         - locked <0x000000050389c8d0> (a
         dali.fs.FileSystem$Cache$ClientFinalizer)
         at dali.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
         "FileSystem-DelegationTokenRenewer":
         at dali.fs.FileSystem$Cache.getInternal(FileSystem.java:2343)
         - waiting to lock <0x000000050389c8b8> (a dali.fs.FileSystem$Cache)
         at dali.fs.FileSystem$Cache.get(FileSystem.java:2332)
         at dali.fs.FileSystem.get(FileSystem.java:369)
         at
         dali.hdfs.web.TokenAspect$TokenManager.getInstance(TokenAspect.java:92)
         at dali.hdfs.web.TokenAspect$TokenManager.renew(TokenAspect.java:72)
         at dali.security.token.Token.renew(Token.java:373)
         at
      
         dali.fs.DelegationTokenRenewer$RenewAction.renew(DelegationTokenRenewer.java:127)
         - locked <0x000000051ec3f930> (a dali.hdfs.web.WebHdfsFileSystem)
         at
      
         dali.fs.DelegationTokenRenewer$RenewAction.access$300(DelegationTokenRenewer.java:57)
         at dali.fs.DelegationTokenRenewer.run(DelegationTokenRenewer.java:258)
      
      Found 1 deadlock.
      

      Attachments

        Activity

          People

            xkrogen Erik Krogen
            xkrogen Erik Krogen
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: