Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 2.0.2-alpha
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      HttpFS opens and closes a FileSystem instance against the backend filesystem (typically HDFS) on every request. The FileSystem caching is not used as it does not have expiration/timeout and filesystem instances in there live forever, for long running services like HttpFS this is not a good thing as it would keep connections open to the NN.

      1. HDFS-3513.patch
        24 kB
        Alejandro Abdelnur
      2. HDFS-3513.patch
        22 kB
        Alejandro Abdelnur
      3. HDFS-3513.patch
        22 kB
        Alejandro Abdelnur
      4. HDFS-3513.patch
        22 kB
        Alejandro Abdelnur

        Issue Links

          Activity

          Hide
          Alejandro Abdelnur added a comment -

          Time based cache that purges cached filesystem after an idle timeout.

          Show
          Alejandro Abdelnur added a comment - Time based cache that purges cached filesystem after an idle timeout.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12531134/HDFS-3513.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-httpfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2605//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2605//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs-httpfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2605//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531134/HDFS-3513.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-httpfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2605//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2605//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs-httpfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2605//console This message is automatically generated.
          Hide
          Alejandro Abdelnur added a comment -

          cache should keep track of lease reference count

          Show
          Alejandro Abdelnur added a comment - cache should keep track of lease reference count
          Hide
          Alejandro Abdelnur added a comment -

          adding lease count to the cache and taking care of the findbugs warning.

          Show
          Alejandro Abdelnur added a comment - adding lease count to the cache and taking care of the findbugs warning.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12531139/HDFS-3513.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs-httpfs:

          org.apache.hadoop.test.TestHTestCase

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2608//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2608//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs-httpfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2608//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531139/HDFS-3513.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs-httpfs: org.apache.hadoop.test.TestHTestCase +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2608//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2608//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs-httpfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2608//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          Would a more general, and possibly simpler solution, be to not disable the fs cache but instead track the last-use per user? After the expiration period, the user's cached fs instances may be closed. Would this address the comment about worst case the server will have a cached fs per user? That scenario will lead to a lot of "leaked" fds because DFSClient will have sockets in the TIME_WAIT state (see MAPREDUCE-4323).

          Show
          Daryn Sharp added a comment - Would a more general, and possibly simpler solution, be to not disable the fs cache but instead track the last-use per user? After the expiration period, the user's cached fs instances may be closed. Would this address the comment about worst case the server will have a cached fs per user? That scenario will lead to a lot of "leaked" fds because DFSClient will have sockets in the TIME_WAIT state (see MAPREDUCE-4323 ).
          Hide
          Alejandro Abdelnur added a comment -

          It is a bit more complicated than that, for some reason (if I recall correctly this was for some JT requirements) the FileSystem cache is per UGI but UGI uses object equality instead or username equality. so it you have 2 UGIs for the same user you end up with 2 FileSystem instances the FileSystem cache. (run into this problem with Oozie when hadoop security came to be and I was told that is the way it has to be). But going back to this patch, you won't have leaks, the entries in the fsCache in HttpFS remain in the cache by their FileSystem instances are closed after the time out. the worse case scenario I'm referring to, it is about the number of CachedFileSystem entires in the fsCache in HttpFS, this map that serves as cached is not being purged of entries, but the FileSystem instances in the entries are certainly closed after time out, thus no sockets TIME_WAIT (specially because with this cache HttpFS is quite aggressive on closing the FileSystem instances).

          Hope this clarifies.

          Show
          Alejandro Abdelnur added a comment - It is a bit more complicated than that, for some reason (if I recall correctly this was for some JT requirements) the FileSystem cache is per UGI but UGI uses object equality instead or username equality. so it you have 2 UGIs for the same user you end up with 2 FileSystem instances the FileSystem cache. (run into this problem with Oozie when hadoop security came to be and I was told that is the way it has to be). But going back to this patch, you won't have leaks, the entries in the fsCache in HttpFS remain in the cache by their FileSystem instances are closed after the time out. the worse case scenario I'm referring to, it is about the number of CachedFileSystem entires in the fsCache in HttpFS, this map that serves as cached is not being purged of entries, but the FileSystem instances in the entries are certainly closed after time out, thus no sockets TIME_WAIT (specially because with this cache HttpFS is quite aggressive on closing the FileSystem instances). Hope this clarifies.
          Hide
          Daryn Sharp added a comment -

          I should have thought of the UGI issue since it's biting us in other situations. On that tangent, do you recall why it's supposed to be like that, or is there a jira with a discussion?

          Show
          Daryn Sharp added a comment - I should have thought of the UGI issue since it's biting us in other situations. On that tangent, do you recall why it's supposed to be like that, or is there a jira with a discussion?
          Hide
          Devaraj Das added a comment -

          the FileSystem cache is per UGI but UGI uses object equality instead or username equality

          https://issues.apache.org/jira/browse/HADOOP-6670 has the discussion on the equals implementation change.

          Show
          Devaraj Das added a comment - the FileSystem cache is per UGI but UGI uses object equality instead or username equality https://issues.apache.org/jira/browse/HADOOP-6670 has the discussion on the equals implementation change.
          Hide
          Alejandro Abdelnur added a comment -

          So it seem that for now we are stuck with doing FS caching. Given that, are we good with the latest patch?

          Show
          Alejandro Abdelnur added a comment - So it seem that for now we are stuck with doing FS caching. Given that, are we good with the latest patch?
          Hide
          Alejandro Abdelnur added a comment -

          reuploading same patch to get a jenkins rerun, the original build is gone, cannot check the testcase failure

          Show
          Alejandro Abdelnur added a comment - reuploading same patch to get a jenkins rerun, the original build is gone, cannot check the testcase failure
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12533919/HDFS-3513.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-httpfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2720//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2720//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs-httpfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2720//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12533919/HDFS-3513.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-httpfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2720//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2720//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs-httpfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2720//console This message is automatically generated.
          Hide
          Alejandro Abdelnur added a comment -

          findbugs is wrong, the patch is checking for existence of the entry in te cache before attempting a release. I'll add this exclusion to findbugs as part of the commit.

          Show
          Alejandro Abdelnur added a comment - findbugs is wrong, the patch is checking for existence of the entry in te cache before attempting a release. I'll add this exclusion to findbugs as part of the commit.
          Hide
          Daryn Sharp added a comment -

          We should probably wait for the similar discussion in HIVE-3098 to be resolved?

          Show
          Daryn Sharp added a comment - We should probably wait for the similar discussion in HIVE-3098 to be resolved?
          Hide
          Alejandro Abdelnur added a comment -

          Given the discussion on HIVE-3098, can we take this JIRA out of the parking lot?

          Show
          Alejandro Abdelnur added a comment - Given the discussion on HIVE-3098 , can we take this JIRA out of the parking lot?
          Hide
          Daryn Sharp added a comment -

          The hive jira has hilighted complexities and possible issues with trying to cache ugis/filesystems. Out of curiosity, have you benchmarked whether the ugi cache provides a significant benefit?

          Show
          Daryn Sharp added a comment - The hive jira has hilighted complexities and possible issues with trying to cache ugis/filesystems. Out of curiosity, have you benchmarked whether the ugi cache provides a significant benefit?
          Hide
          Eli Collins added a comment -

          Tucu, given the proxy use case, how important is caching fs handles in HttpFS? Eg what's the expected speedup on a real-world distcp?

          Show
          Eli Collins added a comment - Tucu, given the proxy use case, how important is caching fs handles in HttpFS? Eg what's the expected speedup on a real-world distcp?
          Hide
          Alejandro Abdelnur added a comment -

          I've done some testing, accesing HttpFS via 'hadoop fs' and the difference with caching is significant when doing multiple short operations. Below an example doing a recursive LS (I've run the command multiple times in each configuration before taking a sample).

          For distcp usecases I don't think will make much difference once data is being streamed (which should account for most of the distcp time). But for integration from other systems that may do regular lookups I think there is merit in having FS caching in HttpFS.

          Also, the numbers below are without security enabled. I would presume that with security ON the lag will be greater.

          Without caching disabled:

          $ time bin/hadoop fs -fs webhdfs://localhost:14000 -lsr /
          lsr: DEPRECATED: Please use 'ls -R' instead.
          drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 /tmp
          drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 /tmp/hadoop-yarn
          drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 /tmp/hadoop-yarn/staging
          drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history
          drwxrwx---   - tucu supergroup          0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done
          drwxrwxrwt   - tucu supergroup          0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done_intermediate
          
          real	0m1.717s
          user	0m1.599s
          sys	0m0.105s
          

          With caching enabled:

          $ time bin/hadoop fs -fs webhdfs://localhost:14000 -lsr /
          lsr: DEPRECATED: Please use 'ls -R' instead.
          drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 /tmp
          drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 /tmp/hadoop-yarn
          drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 /tmp/hadoop-yarn/staging
          drwxr-x---   - tucu supergroup          0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history
          drwxrwx---   - tucu supergroup          0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done
          drwxrwxrwt   - tucu supergroup          0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done_intermediate
          
          real	0m0.879s
          user	0m1.471s
          sys	0m0.101s
          
          Show
          Alejandro Abdelnur added a comment - I've done some testing, accesing HttpFS via 'hadoop fs' and the difference with caching is significant when doing multiple short operations. Below an example doing a recursive LS (I've run the command multiple times in each configuration before taking a sample). For distcp usecases I don't think will make much difference once data is being streamed (which should account for most of the distcp time). But for integration from other systems that may do regular lookups I think there is merit in having FS caching in HttpFS. Also, the numbers below are without security enabled. I would presume that with security ON the lag will be greater. Without caching disabled: $ time bin/hadoop fs -fs webhdfs: //localhost:14000 -lsr / lsr: DEPRECATED: Please use 'ls -R' instead. drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history drwxrwx--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done drwxrwxrwt - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done_intermediate real 0m1.717s user 0m1.599s sys 0m0.105s With caching enabled: $ time bin/hadoop fs -fs webhdfs: //localhost:14000 -lsr / lsr: DEPRECATED: Please use 'ls -R' instead. drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging drwxr-x--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history drwxrwx--- - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done drwxrwxrwt - tucu supergroup 0 2012-07-26 07:47 /tmp/hadoop-yarn/staging/history/done_intermediate real 0m0.879s user 0m1.471s sys 0m0.101s
          Hide
          Daryn Sharp added a comment -

          Well, that is unexpectedly significant for just 4 listStatus operations... For the sake of discussion: Are perhaps http keep-alive connections not being used? A keep-alive connection should effectively achieve ugi caching too. To prevent starvation, you might have to do something like limit the number of requests/connection, with a low idle timeout for subsequent requests.

          Show
          Daryn Sharp added a comment - Well, that is unexpectedly significant for just 4 listStatus operations... For the sake of discussion: Are perhaps http keep-alive connections not being used? A keep-alive connection should effectively achieve ugi caching too. To prevent starvation, you might have to do something like limit the number of requests/connection, with a low idle timeout for subsequent requests.
          Hide
          Alejandro Abdelnur added a comment -

          JDK HttpURLConnection does keep-alive by default. Also note that even if keep-alive would be OFF it would be off in both tests, the only difference here is the FileSystem caching. Not sure I get the point you are trying to make after your question. Would you please clarify? Thanks.

          Show
          Alejandro Abdelnur added a comment - JDK HttpURLConnection does keep-alive by default. Also note that even if keep-alive would be OFF it would be off in both tests, the only difference here is the FileSystem caching. Not sure I get the point you are trying to make after your question. Would you please clarify? Thanks.
          Hide
          Alejandro Abdelnur added a comment -

          rebasing patch to current trunk and excluding false findbugs warning.

          Show
          Alejandro Abdelnur added a comment - rebasing patch to current trunk and excluding false findbugs warning.
          Hide
          Daryn Sharp added a comment -

          Yes, HttpURLConnection on the client side will assume keep-alive (since that's default behavior for http/1.1) but I've seen namenode servlets always sending "Connection: close" in the response.

          If the server is disabling keep-alive then every request creates a new socket and a new ugi and a new filesystem, hence why the ugi caching would show such an improvement. If however a keep-alive connection is used, I would expect the same ugi to be used for all the pipelined requests. However, maybe each request on a keep-alive gets a new ugi anyway...

          Show
          Daryn Sharp added a comment - Yes, HttpURLConnection on the client side will assume keep-alive (since that's default behavior for http/1.1) but I've seen namenode servlets always sending "Connection: close" in the response. If the server is disabling keep-alive then every request creates a new socket and a new ugi and a new filesystem, hence why the ugi caching would show such an improvement. If however a keep-alive connection is used, I would expect the same ugi to be used for all the pipelined requests. However, maybe each request on a keep-alive gets a new ugi anyway...
          Hide
          Alejandro Abdelnur added a comment -

          HttpFS uses tomcat which by default has keep-alive ON. but still, the only diff in my testing is enabling/disabling the FS caching. Regarding your suggestion of using the same UGi for pipelined request, this is not possible as in servlets you don't know you are being in a pipelined request and it would mean we would have to keep UGIs sitting around in HttpFS in a cache. I don't see the value added of that, plus if a the HttpFS client is doing different doAs, the same UGI won't work. Because of that, I think the current approach is more suited and simpler.

          Show
          Alejandro Abdelnur added a comment - HttpFS uses tomcat which by default has keep-alive ON. but still, the only diff in my testing is enabling/disabling the FS caching. Regarding your suggestion of using the same UGi for pipelined request, this is not possible as in servlets you don't know you are being in a pipelined request and it would mean we would have to keep UGIs sitting around in HttpFS in a cache. I don't see the value added of that, plus if a the HttpFS client is doing different doAs, the same UGI won't work. Because of that, I think the current approach is more suited and simpler.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12538017/HDFS-3513.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-httpfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2913//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2913//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538017/HDFS-3513.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-httpfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2913//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2913//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          Fair enough,

          1. It looks like there's a synchronization issue in closeFileSystem that can cause an NPE if the fs is purged by another thread between the contains and get?
          2. The cache handling in createFileSystem looks error prone. It's taking a conf, but ignoring when looking up a cache entry, then duping it before getting the fs which again is going to ignore the conf if fs is already set?
          3. Not a big deal, but I'd suggest commonizing release and purgeIfIdle for the former to call the latter to avoid copy-n-paste code. Just a suggestion.
          Show
          Daryn Sharp added a comment - Fair enough, It looks like there's a synchronization issue in closeFileSystem that can cause an NPE if the fs is purged by another thread between the contains and get? The cache handling in createFileSystem looks error prone. It's taking a conf, but ignoring when looking up a cache entry, then duping it before getting the fs which again is going to ignore the conf if fs is already set? Not a big deal, but I'd suggest commonizing release and purgeIfIdle for the former to call the latter to avoid copy-n-paste code. Just a suggestion.
          Hide
          Alejandro Abdelnur added a comment -

          Daryn, thxs for reviewing the patch.

          #1 as explained in the comment in the purgeIfIdle() method, entries from the cache are never removed, only the FileSystem instance is closed.

          #2 the cache is exclusively on per user, not on configuration. As all configurations are created in the same way, for a given user they are identical. The duping is only done because we are injecting a property and the passed configuration is not owned by the createFileSystem method.

          #3 I've given it a try by it complicates the purgeIfIdle logic, I'd prefer to leave it like it is.

          Show
          Alejandro Abdelnur added a comment - Daryn, thxs for reviewing the patch. #1 as explained in the comment in the purgeIfIdle() method, entries from the cache are never removed, only the FileSystem instance is closed. #2 the cache is exclusively on per user, not on configuration. As all configurations are created in the same way, for a given user they are identical. The duping is only done because we are injecting a property and the passed configuration is not owned by the createFileSystem method. #3 I've given it a try by it complicates the purgeIfIdle logic, I'd prefer to leave it like it is.
          Hide
          Daryn Sharp added a comment -

          +1 I'm fine with it as-is. Thanks for the clarifications!

          Show
          Daryn Sharp added a comment - +1 I'm fine with it as-is. Thanks for the clarifications!
          Hide
          Alejandro Abdelnur added a comment -

          committed to trunk and branch-2.

          Show
          Alejandro Abdelnur added a comment - committed to trunk and branch-2.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2615 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2615/)
          HDFS-3513. HttpFS should cache filesystems. (tucu) (Revision 1368304)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368304
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/hadoop/FileSystemAccessService.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/hadoop/TestFileSystemAccessService.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2615 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2615/ ) HDFS-3513 . HttpFS should cache filesystems. (tucu) (Revision 1368304) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368304 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/hadoop/FileSystemAccessService.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/hadoop/TestFileSystemAccessService.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2550 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2550/)
          HDFS-3513. HttpFS should cache filesystems. (tucu) (Revision 1368304)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368304
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/hadoop/FileSystemAccessService.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/hadoop/TestFileSystemAccessService.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2550 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2550/ ) HDFS-3513 . HttpFS should cache filesystems. (tucu) (Revision 1368304) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368304 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/hadoop/FileSystemAccessService.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/hadoop/TestFileSystemAccessService.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2568 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2568/)
          HDFS-3513. HttpFS should cache filesystems. (tucu) (Revision 1368304)

          Result = FAILURE
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368304
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/hadoop/FileSystemAccessService.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/hadoop/TestFileSystemAccessService.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2568 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2568/ ) HDFS-3513 . HttpFS should cache filesystems. (tucu) (Revision 1368304) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368304 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/hadoop/FileSystemAccessService.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/hadoop/TestFileSystemAccessService.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1123 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1123/)
          HDFS-3513. HttpFS should cache filesystems. (tucu) (Revision 1368304)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368304
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/hadoop/FileSystemAccessService.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/hadoop/TestFileSystemAccessService.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1123 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1123/ ) HDFS-3513 . HttpFS should cache filesystems. (tucu) (Revision 1368304) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368304 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/hadoop/FileSystemAccessService.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/hadoop/TestFileSystemAccessService.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1155 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1155/)
          HDFS-3513. HttpFS should cache filesystems. (tucu) (Revision 1368304)

          Result = FAILURE
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368304
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/hadoop/FileSystemAccessService.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/hadoop/TestFileSystemAccessService.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1155 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1155/ ) HDFS-3513 . HttpFS should cache filesystems. (tucu) (Revision 1368304) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368304 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/service/hadoop/FileSystemAccessService.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/service/hadoop/TestFileSystemAccessService.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

            People

            • Assignee:
              Alejandro Abdelnur
              Reporter:
              Alejandro Abdelnur
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development