Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4824

FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.4-alpha
    • Fix Version/s: 3.0.0, 2.1.0-beta
    • Component/s: hdfs-client
    • Labels:
      None

      Description

      FileInputStreamCache leaves around a reference to its cacheCleaner after close().

      The cacheCleaner is created like this:

      if (cacheCleaner == null) {
                cacheCleaner = new CacheCleaner();
                executor.scheduleAtFixedRate(cacheCleaner, expiryTimeMs, expiryTimeMs,
                    TimeUnit.MILLISECONDS);
              }
      

      and supposedly removed like this:

      if (cacheCleaner != null) {
        executor.remove(cacheCleaner);
      }
      

      However, ScheduledThreadPoolExecutor.remove returns a success boolean which should be checked. And I think from a quick read of that class that the return value of scheduleAtFixedRate should be used as the argument to remove.

      1. HDFS-4824.001.patch
        2 kB
        Colin Patrick McCabe
      2. HDFS-4824.002.patch
        4 kB
        Colin Patrick McCabe

        Activity

        Hide
        Colin Patrick McCabe added a comment -

        yeah, we definitely should be checking that return code. I'm still confused about whether we should be able to cancel via the runnable or not. The types seem right, but there might be some issues surrounding Runnables that wrap other Runnables... I will test.

        It's better to cancel via the future anyway, I think.

        Show
        Colin Patrick McCabe added a comment - yeah, we definitely should be checking that return code. I'm still confused about whether we should be able to cancel via the runnable or not. The types seem right, but there might be some issues surrounding Runnables that wrap other Runnables... I will test. It's better to cancel via the future anyway, I think.
        Hide
        Colin Patrick McCabe added a comment -

        OK, I figured out what is going on here. scheduleAtFixedRate is actually creating a new Runnable which wraps the one that it's passed in (cacheCleaner). So telling the executor to remove cacheCleaner itself doesn't work, since the real runnable that was scheduled is the wrapper.

        Instead, we should keep around the future which scheduleAtFixedRate returned, and call Future#Cancel on it.

        Patch is attached.

        Show
        Colin Patrick McCabe added a comment - OK, I figured out what is going on here. scheduleAtFixedRate is actually creating a new Runnable which wraps the one that it's passed in (cacheCleaner). So telling the executor to remove cacheCleaner itself doesn't work, since the real runnable that was scheduled is the wrapper. Instead, we should keep around the future which scheduleAtFixedRate returned, and call Future#Cancel on it. Patch is attached.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12583357/HDFS-4824.001.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4398//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4398//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12583357/HDFS-4824.001.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4398//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4398//console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        Patch looks good. One more thought which might be unnecessarily conservative: should the cleaner hold a WeakReference to the cache? That way even if users leak objects it's less likely to get held on forever.

        Show
        Todd Lipcon added a comment - Patch looks good. One more thought which might be unnecessarily conservative: should the cleaner hold a WeakReference to the cache? That way even if users leak objects it's less likely to get held on forever.
        Hide
        Colin Patrick McCabe added a comment -

        That's an interesting idea, but we already hold strong (regular) references to stuff like the DFSClient and the PeerCache from the DFSInputStream. So if we were going to do something like this, it should be done consistently.

        Whether or not it's a good idea to start using weak references for this stuff, I'm a little unsure. At some point, the code gets a lot more complex, to support a use case we're not supposed to be supporting. I mean if we had to go through the weak reference dance each time we accessed dfsclient from dfsinputstream, it might be more than I could stand. At least that's my initial thought.

        Show
        Colin Patrick McCabe added a comment - That's an interesting idea, but we already hold strong (regular) references to stuff like the DFSClient and the PeerCache from the DFSInputStream . So if we were going to do something like this, it should be done consistently. Whether or not it's a good idea to start using weak references for this stuff, I'm a little unsure. At some point, the code gets a lot more complex, to support a use case we're not supposed to be supporting. I mean if we had to go through the weak reference dance each time we accessed dfsclient from dfsinputstream, it might be more than I could stand. At least that's my initial thought.
        Hide
        Todd Lipcon added a comment -

        I was thinking that the real issue here seems to be that we have an uncollectable set of objects where the only incoming reference is coming from the CacheCleaner object which is staying in the executor (which is a GC root). So, just breaking that one reference would mean that the whole set of objects could be GCed (since nothing else refers into that graph).

        With your fix in place, if you leak a DFSInputStream without closing, does it eventually get GCed in a full GC? Or would it stay around forever? I don't think there are other hanging references to DFSInputStreams, but maybe I'm wrong.

        Show
        Todd Lipcon added a comment - I was thinking that the real issue here seems to be that we have an uncollectable set of objects where the only incoming reference is coming from the CacheCleaner object which is staying in the executor (which is a GC root). So, just breaking that one reference would mean that the whole set of objects could be GCed (since nothing else refers into that graph). With your fix in place, if you leak a DFSInputStream without closing, does it eventually get GCed in a full GC? Or would it stay around forever? I don't think there are other hanging references to DFSInputStreams, but maybe I'm wrong.
        Hide
        Colin Patrick McCabe added a comment -

        OK. I think I can see your logic. The references I was talking about were references from the DFSInputStream to the DFSClient. But neither of those are GC roots, whereas FileInputStreamCache#executor certainly is. So there may be some value in breaking the connection from the GC root to the FileInputStreamCache, which is after all a per-DFSInputStream object.

        With your fix in place, if you leak a DFSInputStream without closing, does it eventually get GCed in a full GC? Or would it stay around forever? I don't think there are other hanging references to DFSInputStreams, but maybe I'm wrong.

        I didn't see any in my cursory inspection. It's pretty difficult to tell without dynamic instrumentation, though.

        Show
        Colin Patrick McCabe added a comment - OK. I think I can see your logic. The references I was talking about were references from the DFSInputStream to the DFSClient. But neither of those are GC roots, whereas FileInputStreamCache#executor certainly is. So there may be some value in breaking the connection from the GC root to the FileInputStreamCache , which is after all a per- DFSInputStream object. With your fix in place, if you leak a DFSInputStream without closing, does it eventually get GCed in a full GC? Or would it stay around forever? I don't think there are other hanging references to DFSInputStreams, but maybe I'm wrong. I didn't see any in my cursory inspection. It's pretty difficult to tell without dynamic instrumentation, though.
        Hide
        Colin Patrick McCabe added a comment -

        I added the weakreference stuff. It was easier than I thought.

        Show
        Colin Patrick McCabe added a comment - I added the weakreference stuff. It was easier than I thought.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12583550/HDFS-4824.002.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.TestShortCircuitLocalRead

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4407//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4407//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12583550/HDFS-4824.002.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestShortCircuitLocalRead +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4407//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4407//console This message is automatically generated.
        Hide
        Colin Patrick McCabe added a comment -

        Looks like someone or something deleted the socket directory in /tmp that TestShortCircuitLocalRead was using.

        Tests in error: 
          testFileLocalReadNoChecksum(org.apache.hadoop.hdfs.TestShortCircuitLocalRead): bind(2) error: No such file or directory when trying to bind to '/tmp/socks.1368743681542.2028583750/TestShortCircuitLocalRead.35104.sock'
          testFileLocalReadChecksum(org.apache.hadoop.hdfs.TestShortCircuitLocalRead): bind(2) error: No such file or directory when trying to bind to '/tmp/socks.1368743681542.2028583750/TestShortCircuitLocalRead.56667.sock'
          testSmallFileLocalRead(org.apache.hadoop.hdfs.TestShortCircuitLocalRead): bind(2) error: No such file or directory when trying to bind to '/tmp/socks.1368743681542.2028583750/TestShortCircuitLocalRead.47030.sock'
          testLocalReadLegacy(org.apache.hadoop.hdfs.TestShortCircuitLocalRead): bind(2) error: No such file or directory when trying to bind to '/tmp/socks.1368743681542.2028583750/TestShortCircuitLocalRead.50662.sock'
          testLocalReadFallback(org.apache.hadoop.hdfs.TestShortCircuitLocalRead): bind(2) error: No such file or directory when trying to bind to '/tmp/socks.1368743681542.2028583750/TestShortCircuitLocalRead.34831.sock'
          testReadFromAnOffset(org.apache.hadoop.hdfs.TestShortCircuitLocalRead): bind(2) error: No such file or directory when trying to bind to '/tmp/socks.1368743681542.2028583750/TestShortCircuitLocalRead.47719.sock'
          testLongFile(org.apache.hadoop.hdfs.TestShortCircuitLocalRead): bind(2) error: No such file or directory when trying to bind to '/tmp/socks.1368743681542.2028583750/TestShortCircuitLocalRead.58149.sock'
        

        I've never seen this one before, and it definitely doesn't reproduce here. Rogue /tmp cleaner process? I guess let's re-run.

        Show
        Colin Patrick McCabe added a comment - Looks like someone or something deleted the socket directory in /tmp that TestShortCircuitLocalRead was using. Tests in error: testFileLocalReadNoChecksum(org.apache.hadoop.hdfs.TestShortCircuitLocalRead): bind(2) error: No such file or directory when trying to bind to '/tmp/socks.1368743681542.2028583750/TestShortCircuitLocalRead.35104.sock' testFileLocalReadChecksum(org.apache.hadoop.hdfs.TestShortCircuitLocalRead): bind(2) error: No such file or directory when trying to bind to '/tmp/socks.1368743681542.2028583750/TestShortCircuitLocalRead.56667.sock' testSmallFileLocalRead(org.apache.hadoop.hdfs.TestShortCircuitLocalRead): bind(2) error: No such file or directory when trying to bind to '/tmp/socks.1368743681542.2028583750/TestShortCircuitLocalRead.47030.sock' testLocalReadLegacy(org.apache.hadoop.hdfs.TestShortCircuitLocalRead): bind(2) error: No such file or directory when trying to bind to '/tmp/socks.1368743681542.2028583750/TestShortCircuitLocalRead.50662.sock' testLocalReadFallback(org.apache.hadoop.hdfs.TestShortCircuitLocalRead): bind(2) error: No such file or directory when trying to bind to '/tmp/socks.1368743681542.2028583750/TestShortCircuitLocalRead.34831.sock' testReadFromAnOffset(org.apache.hadoop.hdfs.TestShortCircuitLocalRead): bind(2) error: No such file or directory when trying to bind to '/tmp/socks.1368743681542.2028583750/TestShortCircuitLocalRead.47719.sock' testLongFile(org.apache.hadoop.hdfs.TestShortCircuitLocalRead): bind(2) error: No such file or directory when trying to bind to '/tmp/socks.1368743681542.2028583750/TestShortCircuitLocalRead.58149.sock' I've never seen this one before, and it definitely doesn't reproduce here. Rogue /tmp cleaner process? I guess let's re-run.
        Hide
        Colin Patrick McCabe added a comment -

        re-run tests

        Show
        Colin Patrick McCabe added a comment - re-run tests
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12583591/HDFS-4824.002.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4411//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4411//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12583591/HDFS-4824.002.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4411//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4411//console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        +1, looks good. I'll commit momentarily

        Show
        Todd Lipcon added a comment - +1, looks good. I'll commit momentarily
        Hide
        Todd Lipcon added a comment -

        Committed to branch-2 and trunk. Thanks, Colin

        Show
        Todd Lipcon added a comment - Committed to branch-2 and trunk. Thanks, Colin
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3763 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3763/)
        HDFS-4824. FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner. Contributed by Colin Patrick McCabe. (Revision 1483641)

        Result = SUCCESS
        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1483641
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/FileInputStreamCache.java
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3763 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3763/ ) HDFS-4824 . FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner. Contributed by Colin Patrick McCabe. (Revision 1483641) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1483641 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/FileInputStreamCache.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #212 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/212/)
        HDFS-4824. FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner. Contributed by Colin Patrick McCabe. (Revision 1483641)

        Result = FAILURE
        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1483641
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/FileInputStreamCache.java
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #212 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/212/ ) HDFS-4824 . FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner. Contributed by Colin Patrick McCabe. (Revision 1483641) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1483641 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/FileInputStreamCache.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1401 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1401/)
        HDFS-4824. FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner. Contributed by Colin Patrick McCabe. (Revision 1483641)

        Result = FAILURE
        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1483641
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/FileInputStreamCache.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1401 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1401/ ) HDFS-4824 . FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner. Contributed by Colin Patrick McCabe. (Revision 1483641) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1483641 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/FileInputStreamCache.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1428 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1428/)
        HDFS-4824. FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner. Contributed by Colin Patrick McCabe. (Revision 1483641)

        Result = FAILURE
        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1483641
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/FileInputStreamCache.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1428 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1428/ ) HDFS-4824 . FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner. Contributed by Colin Patrick McCabe. (Revision 1483641) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1483641 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/FileInputStreamCache.java

          People

          • Assignee:
            Colin Patrick McCabe
            Reporter:
            Henry Robinson
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development