Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3545

DFSClient leak due to malfunctioning of FileSystem Cache

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 2.0.0-alpha, 3.0.0
    • Fix Version/s: None
    • Component/s: hdfs-client
    • Labels:
      None

      Description

      For every FileSystem.get new FileSystem object is getting created even though the UGI object passed has same name. This is creating the lot of FileSystem objects and cached in FileSystem cache instead of using the same cached object .

      This is causing the Cache to grow in size causing OOME

      This behaviour can be seen in Mapred and Hive components also since they use FileSystem.get in the described fashion

        Issue Links

          Activity

          Show
          amith added a comment - Reference in MR https://issues.apache.org/jira/browse/MAPREDUCE-4340 Reference in Hive https://issues.apache.org/jira/browse/hive-3155
          Hide
          amith added a comment -

          Currently to fix this defect, we need to get the FileSystem object from the cache if the UGI credentials matches, but this FileSystem objects can lead to wrong token being used in setting up connections causing security related issue. HADOOP-6564

          Show
          amith added a comment - Currently to fix this defect, we need to get the FileSystem object from the cache if the UGI credentials matches, but this FileSystem objects can lead to wrong token being used in setting up connections causing security related issue. HADOOP-6564
          Hide
          Daryn Sharp added a comment -

          I've looked into "fixing" this problem too, but I don't think it's solvable. A UGI/Subject is mutable so even though two different UGI instances may appear identical at one point in time, one UGI's Subject may have altered tokens that should not be visible by other UGI instances.

          Show
          Daryn Sharp added a comment - I've looked into "fixing" this problem too, but I don't think it's solvable. A UGI/Subject is mutable so even though two different UGI instances may appear identical at one point in time, one UGI's Subject may have altered tokens that should not be visible by other UGI instances.
          Hide
          Daryn Sharp added a comment -

          I've looked into the issue quite a bit further. It's not a cache issue, but client code is misusing the filesystem.

          Systems encountering OOM for the FileSystem cache are not properly calling FileSystem.closeAllForUGI(ugi). If UserGroupInformation.createRemoteUser(user) is called and the resulting ugi is used to obtain filesystems, then FileSystem.closeAllForUGI must be invoked when the created ugi is no longer needed. I'm preparing to do this in the NM.

          Caching the UGI is fraught with peril. Tokens will be unexpectedly shared. Reusing a token in a cached ugi risks using an old expired token. To compound the issue, tokens can be added but not removed from a ugi. Thus even if a new token is obtained & added to the ugi, then it's non-deterministic whether the new token or the old expired token will be used for a connection.

          There are also possible security issues related to allowing multiple jobs for a client using a union of all acquired tokens in a cached ugi.

          Show
          Daryn Sharp added a comment - I've looked into the issue quite a bit further. It's not a cache issue, but client code is misusing the filesystem. Systems encountering OOM for the FileSystem cache are not properly calling FileSystem.closeAllForUGI(ugi) . If UserGroupInformation.createRemoteUser(user) is called and the resulting ugi is used to obtain filesystems, then FileSystem.closeAllForUGI must be invoked when the created ugi is no longer needed. I'm preparing to do this in the NM. Caching the UGI is fraught with peril. Tokens will be unexpectedly shared. Reusing a token in a cached ugi risks using an old expired token. To compound the issue, tokens can be added but not removed from a ugi. Thus even if a new token is obtained & added to the ugi, then it's non-deterministic whether the new token or the old expired token will be used for a connection. There are also possible security issues related to allowing multiple jobs for a client using a union of all acquired tokens in a cached ugi.
          Hide
          Mithun Radhakrishnan added a comment -

          I'm curious to know if doing an FS.closeAllForUgi() fixes this problem. I can confirm that it did fix HIVE-3098.

          A confirmation here would be a double-check for the HIVE-3098 fix.

          Show
          Mithun Radhakrishnan added a comment - I'm curious to know if doing an FS.closeAllForUgi() fixes this problem. I can confirm that it did fix HIVE-3098 . A confirmation here would be a double-check for the HIVE-3098 fix.

            People

            • Assignee:
              Unassigned
              Reporter:
              amith
            • Votes:
              0 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

              • Created:
                Updated:

                Development