Hadoop Common
  1. Hadoop Common
  2. HADOOP-6356

Add a Cache for AbstractFileSystem in the new FileContext/AbstractFileSystem framework.

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.22.0
    • Fix Version/s: None
    • Component/s: fs
    • Labels:
      None

      Description

      The new filesystem framework, FileContext and AbstractFileSystem does not implement a cache for AbstractFileSystem.
      This Jira proposes to add a cache to the new framework just like with the old FileSystem.

        Issue Links

          Activity

          Hide
          Colin Patrick McCabe added a comment -

          Yeah, it would be nice if FileContext had a close method.

          Show
          Colin Patrick McCabe added a comment - Yeah, it would be nice if FileContext had a close method.
          Hide
          Chris Nauroth added a comment -

          If not a cache, then I do think FileContext would benefit from having a close method. Right now, FileContext doesn't provide any kind of reliable shutdown hook where a file system implementor can clean up scarce resources (i.e. background threads allocated during usage).

          Show
          Chris Nauroth added a comment - If not a cache, then I do think FileContext would benefit from having a close method. Right now, FileContext doesn't provide any kind of reliable shutdown hook where a file system implementor can clean up scarce resources (i.e. background threads allocated during usage).
          Hide
          Colin Patrick McCabe added a comment -

          I think the performance situation may not be as bad as you think. HDFS caches things like sockets and short-circuit file descriptors in a separate, global cache that is not per-FileContext. They are cached in org.apache.hadoop.hdfs.ClientContext and will be shared between multiple different FileContext objects.

          The big problems with caching FileContext objects are:

          • they're mutable, so you don't know when another thread will call FileContext#setWorkingDirectory and screw up how your thread is resolving relative paths
          • UGI has to be a part of the "lookup key" for any cache, and we have refused to compare UGIs except by object equality in the past. This can lead to (for some) counter-intuitive cache behavior.

          Given these limitations, I think it's better to have the applications do their own caching. Maybe we could provide a utility class that would help with this, but we should stay away from adding more global variables.

          Show
          Colin Patrick McCabe added a comment - I think the performance situation may not be as bad as you think. HDFS caches things like sockets and short-circuit file descriptors in a separate, global cache that is not per- FileContext . They are cached in org.apache.hadoop.hdfs.ClientContext and will be shared between multiple different FileContext objects. The big problems with caching FileContext objects are: they're mutable, so you don't know when another thread will call FileContext#setWorkingDirectory and screw up how your thread is resolving relative paths UGI has to be a part of the "lookup key" for any cache, and we have refused to compare UGIs except by object equality in the past. This can lead to (for some) counter-intuitive cache behavior. Given these limitations, I think it's better to have the applications do their own caching. Maybe we could provide a utility class that would help with this, but we should stay away from adding more global variables.
          Hide
          Sumit Kumar added a comment -

          @all - trying to bring your attention on this JIRA? I see that parts of Hive/Hadoop code have already started consuming these apis but looking at this JIRA, there hasn't been much interest since last 2 years

          Show
          Sumit Kumar added a comment - @all - trying to bring your attention on this JIRA? I see that parts of Hive/Hadoop code have already started consuming these apis but looking at this JIRA, there hasn't been much interest since last 2 years
          Hide
          John George added a comment -

          @Nicholas - Caching the states per AFS might be better - that way, like you suggested, we do not have to cache the AFS object itself.
          @Daryn - Would it solve the 'implicit acquisition of dt' use case you brought up?

          Show
          John George added a comment - @Nicholas - Caching the states per AFS might be better - that way, like you suggested, we do not have to cache the AFS object itself. @Daryn - Would it solve the 'implicit acquisition of dt' use case you brought up?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          How about caching the states, which are lightweight, but not the FileContext/AbstractFileSystem objects?

          Show
          Tsz Wo Nicholas Sze added a comment - How about caching the states, which are lightweight, but not the FileContext/AbstractFileSystem objects?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          How about caching the states, which are lightweight, but not the FileContext/AbstractFileSystem objects?

          Show
          Tsz Wo Nicholas Sze added a comment - How about caching the states, which are lightweight, but not the FileContext/AbstractFileSystem objects?
          Hide
          Daryn Sharp added a comment -

          There are a number of cases where filesystems maintain state, such the verify and write checksum flags, but also the implicit acquisition of delegation tokens by hftp and webhdfs. I completely agree that a global afs cache would be bad (there are many other issues with it than discussed here), but I think a per-fc cache of the accessed afs instances would prevent the loss of state.

          Show
          Daryn Sharp added a comment - There are a number of cases where filesystems maintain state, such the verify and write checksum flags, but also the implicit acquisition of delegation tokens by hftp and webhdfs. I completely agree that a global afs cache would be bad (there are many other issues with it than discussed here), but I think a per-fc cache of the accessed afs instances would prevent the loss of state.
          Hide
          John George added a comment -

          An issue that was raised as part of HADOOP-8319 is that in cases where there are symlinks that point to a different FS than the defaultFS, we will lose any state that is maintained by the client. So, for example, the setWriteChecksum or setVerifyChecksum, which are flags that are maintained by the client will be lost since for each instance of the call, we create a new object.

          The reason I raise that here is because it seemed very similar to what is being discussed here. My proposal is that we keep a table of each instance of the filesystem that is not the defaultFS so that we can keep going back to the same instance of the FS within a filecontext. This would also help in a case if we were to ever decide to add anything that requires state to be maintained on the client side.

          Show
          John George added a comment - An issue that was raised as part of HADOOP-8319 is that in cases where there are symlinks that point to a different FS than the defaultFS, we will lose any state that is maintained by the client. So, for example, the setWriteChecksum or setVerifyChecksum, which are flags that are maintained by the client will be lost since for each instance of the call, we create a new object. The reason I raise that here is because it seemed very similar to what is being discussed here. My proposal is that we keep a table of each instance of the filesystem that is not the defaultFS so that we can keep going back to the same instance of the FS within a filecontext. This would also help in a case if we were to ever decide to add anything that requires state to be maintained on the client side.
          Hide
          Sanjay Radia added a comment -

          We could add a cache at the rpcProxy layer.
          This will avoid the extra round trip for FileContext operations to a file system other then the default filesystem. However the current bug of where the NN reboots with the different version will remain.

          Show
          Sanjay Radia added a comment - We could add a cache at the rpcProxy layer. This will avoid the extra round trip for FileContext operations to a file system other then the default filesystem. However the current bug of where the NN reboots with the different version will remain.
          Hide
          Sanjay Radia added a comment -

          Dhruba says > I think it is appropriate to do the compatible-version-check for each RPC instead of doing it per creation-of-FileSystem object.

          Having a FileSystem cache => RPCProxy is also cached => version check is done once per file system object. Dhruba is right in that this is incorrect since the NN may reboot with a different version.

          However we will see a more serious performance problem with FileContext under certain usage patterns. FileContext keeps a pointer to the default file system; hence all file operations to the default filesystem will not require a version check (but it is susceptible to the NN reboot with different verison problem). However operations to other file systems such as fc.open("hdfs://nn/foo/bar) will create a new AbstractFileSystem object that will then create a new dfsClient which will create a new rpcProxy which will do a getVersion() - this is an extra round trip. Good news is that the connection at the lowest layer is cached.

          We can fix this by having each rpc call also piggy back the version number so that the version check is not needed when a proxy is created. This however will break wire compatibility.

          Show
          Sanjay Radia added a comment - Dhruba says > I think it is appropriate to do the compatible-version-check for each RPC instead of doing it per creation-of-FileSystem object. Having a FileSystem cache => RPCProxy is also cached => version check is done once per file system object. Dhruba is right in that this is incorrect since the NN may reboot with a different version. However we will see a more serious performance problem with FileContext under certain usage patterns. FileContext keeps a pointer to the default file system; hence all file operations to the default filesystem will not require a version check (but it is susceptible to the NN reboot with different verison problem). However operations to other file systems such as fc.open("hdfs://nn/foo/bar) will create a new AbstractFileSystem object that will then create a new dfsClient which will create a new rpcProxy which will do a getVersion() - this is an extra round trip. Good news is that the connection at the lowest layer is cached. We can fix this by having each rpc call also piggy back the version number so that the version check is not needed when a proxy is created. This however will break wire compatibility.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Since FileSystem objects themselves are pretty lightweights (not counting RPC connections since there is a RPC cache), I suggest that we only add a table, instead of a cache, to FileContext. The table will be used for FileSystem cleanup during shutdown and processing delete-on-exit files. The table cannot be implemented in AbstractFileSystem since there may be symbolic links when processing delete-on-exit files.

          Show
          Tsz Wo Nicholas Sze added a comment - Since FileSystem objects themselves are pretty lightweights (not counting RPC connections since there is a RPC cache), I suggest that we only add a table, instead of a cache, to FileContext. The table will be used for FileSystem cleanup during shutdown and processing delete-on-exit files. The table cannot be implemented in AbstractFileSystem since there may be symbolic links when processing delete-on-exit files.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > libhdfs uses FileSystem() to open/create/delete random files in HDFS.

          It should not be a problem since the unused objects will be garbage collected.

          Show
          Tsz Wo Nicholas Sze added a comment - > libhdfs uses FileSystem() to open/create/delete random files in HDFS. It should not be a problem since the unused objects will be garbage collected.
          Hide
          dhruba borthakur added a comment -

          libhdfs uses FileSystem() to open/create/delete random files in HDFS.

          Show
          dhruba borthakur added a comment - libhdfs uses FileSystem() to open/create/delete random files in HDFS.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > 3. Improve performance as mentioned by Dhruba. I suspect the improvement was negligible.

          In the current implementation, performance improvement is actually negligible since (a) we do not expect a large number of FileSystem objects and (b) RPC also has a connection cache, i.e. the connections will be re-used even there is no FileSystem cache.

          Show
          Tsz Wo Nicholas Sze added a comment - > 3. Improve performance as mentioned by Dhruba. I suspect the improvement was negligible. In the current implementation, performance improvement is actually negligible since (a) we do not expect a large number of FileSystem objects and (b) RPC also has a connection cache, i.e. the connections will be re-used even there is no FileSystem cache.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The existing FileSystem.CACHE serves the following purposes:

          1. Cleanup FileSystem objects during shutdown. This includes processing deleteOnExit and closing RPC connections in hdfs.
          2. Help closing files.
          3. Improve performance as mentioned by Dhruba. I suspect the improvement was negligible.

          If we choose not to add cache to the new framework, we have to support (1) somehow.

          Show
          Tsz Wo Nicholas Sze added a comment - The existing FileSystem.CACHE serves the following purposes: Cleanup FileSystem objects during shutdown. This includes processing deleteOnExit and closing RPC connections in hdfs. Help closing files. Improve performance as mentioned by Dhruba. I suspect the improvement was negligible. If we choose not to add cache to the new framework, we have to support (1) somehow.
          Hide
          dhruba borthakur added a comment -

          My opinion is to not add a cache. The current cache implementation helps performance because the compatible-version-check is done only once per filesystem object. However, this approach is not very correct because the server could have restarted with a different version of the software. I think it is appropriate to do the compatible-version-check for each RPC instead of doing it per creation-of-FileSystem object.

          Show
          dhruba borthakur added a comment - My opinion is to not add a cache. The current cache implementation helps performance because the compatible-version-check is done only once per filesystem object. However, this approach is not very correct because the server could have restarted with a different version of the software. I think it is appropriate to do the compatible-version-check for each RPC instead of doing it per creation-of-FileSystem object.
          Hide
          Sanjay Radia added a comment -

          FileContext keeps a pointer to the default-filesystem (ie the root or the slash-filesystem).
          Any methods that pass a URI to a different file system will result in a new instance of the file system (AbstractFileSystem) being
          created.

          So the question is should we add a cache? I filed this Jira to explore this question.

          Option1: Do not add a cache, but do keep a pointer to the default filesystem (ie the slash).
          It is okay to creat a new java object for each URI file system being accessed. The RPC layer reuses connections to the same HDFS so caching filesystems is not necessary to reuse a connection. But we need to add a exit hook to close the open leases on JVM exit (note the old FileSystem has an exit hook on the cache which indirectly flushes the open leases on exit or on close.) .

          Option2: Add a AbstractFileSystem cache. This raises the following issue. Recently Hadoop-4655 added FileSystem#newInstace() so that Facebook's Scribe subsystem could bypass the cache. Doing this is a little ugly in general because the notion of the cache is leaking through the interface; further this is hard to do with FileContext/AbstractFileSystem because applications do not create instances of AbstractFileSystem directly (FileContext does it automatically as needed).

          Show
          Sanjay Radia added a comment - FileContext keeps a pointer to the default-filesystem (ie the root or the slash-filesystem). Any methods that pass a URI to a different file system will result in a new instance of the file system (AbstractFileSystem) being created. So the question is should we add a cache? I filed this Jira to explore this question. Option1: Do not add a cache, but do keep a pointer to the default filesystem (ie the slash). It is okay to creat a new java object for each URI file system being accessed. The RPC layer reuses connections to the same HDFS so caching filesystems is not necessary to reuse a connection. But we need to add a exit hook to close the open leases on JVM exit (note the old FileSystem has an exit hook on the cache which indirectly flushes the open leases on exit or on close.) . Option2: Add a AbstractFileSystem cache. This raises the following issue. Recently Hadoop-4655 added FileSystem#newInstace() so that Facebook's Scribe subsystem could bypass the cache. Doing this is a little ugly in general because the notion of the cache is leaking through the interface; further this is hard to do with FileContext/AbstractFileSystem because applications do not create instances of AbstractFileSystem directly (FileContext does it automatically as needed).

            People

            • Assignee:
              Sanjay Radia
              Reporter:
              Sanjay Radia
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:

                Development