Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-6208

DataNode caching can leak file descriptors.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.4.1
    • Component/s: datanode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      In the DataNode, management of mmap'd/mlock'd block files can leak file descriptors.

      1. HDFS-6208.1.patch
        7 kB
        Chris Nauroth

        Activity

        Hide
        Chris Nauroth added a comment -

        There are 2 problems:

        1. The block file stream and checksum file stream are only closed if there is some kind of I/O error. They do not get closed if everything succeeds. mmap/mlock does not rely on keeping the source file descriptor open, so we can change our code to close these.
        2. In test runs, all cached blocks remaining after a test finishes are kept mmap'd into the process. We can handle this by explicitly uncaching at shutdown time.

        I spotted these issues during runs of TestCacheDirectives on Windows, where leaked file descriptors and memory-mapped regions cause subsequent tests to fail, because locks are still held on the underlying block files in the test data directory.

        I have a patch in progress to fix this.

        Show
        Chris Nauroth added a comment - There are 2 problems: The block file stream and checksum file stream are only closed if there is some kind of I/O error. They do not get closed if everything succeeds. mmap/mlock does not rely on keeping the source file descriptor open, so we can change our code to close these. In test runs, all cached blocks remaining after a test finishes are kept mmap'd into the process. We can handle this by explicitly uncaching at shutdown time. I spotted these issues during runs of TestCacheDirectives on Windows, where leaked file descriptors and memory-mapped regions cause subsequent tests to fail, because locks are still held on the underlying block files in the test data directory. I have a patch in progress to fix this.
        Hide
        Chris Nauroth added a comment -

        Here is a patch with these changes:

        1. FsDatasetCache: Close the block file and checksum file in the happy path, not just after a failure. POSIX specs dictate that it's OK to close the file descriptor from which an mmap was obtained and then keep using the mmap, so this is fine.
        2. MappableBlock: Close the FileChannel obtained for the block file and checksum file. This diff looks big only because of indentation changes to introduce a try-finally block.
        3. TestCacheDirectives: At the end of every test, remove all cache directives and wait for 0 blocks cached. This is done so that we know the mmaps are freed, and thus any underlying file locks are released, before starting another MiniDFSCluster. I started out trying to hook this logic into the main DataNode shutdown flow, but it was going to be very brittle coordinating with all of background caching and uncaching threads. It's much more convenient to just rely on the OS cleanup at process shutdown. This is only a problem for tests, where one process starts and stops many test clusters.
        Show
        Chris Nauroth added a comment - Here is a patch with these changes: FsDatasetCache : Close the block file and checksum file in the happy path, not just after a failure. POSIX specs dictate that it's OK to close the file descriptor from which an mmap was obtained and then keep using the mmap, so this is fine. MappableBlock : Close the FileChannel obtained for the block file and checksum file. This diff looks big only because of indentation changes to introduce a try-finally block. TestCacheDirectives : At the end of every test, remove all cache directives and wait for 0 blocks cached. This is done so that we know the mmaps are freed, and thus any underlying file locks are released, before starting another MiniDFSCluster . I started out trying to hook this logic into the main DataNode shutdown flow, but it was going to be very brittle coordinating with all of background caching and uncaching threads. It's much more convenient to just rely on the OS cleanup at process shutdown. This is only a problem for tests, where one process starts and stops many test clusters.
        Hide
        Chris Nauroth added a comment -

        I ran TestCacheDirectives successfully on Mac, Linux and Windows. On Linux, I also ran manual end-to-end tests of caching and then doing zero-copy reads using a custom client app that I wrote.

        Show
        Chris Nauroth added a comment - I ran TestCacheDirectives successfully on Mac, Linux and Windows. On Linux, I also ran manual end-to-end tests of caching and then doing zero-copy reads using a custom client app that I wrote.
        Hide
        Arpit Agarwal added a comment -

        +1

        I verified the fix on Windows and Mac OS X. Thanks Chris!

        Show
        Arpit Agarwal added a comment - +1 I verified the fix on Windows and Mac OS X. Thanks Chris!
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12639427/HDFS-6208.1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6630//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6630//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12639427/HDFS-6208.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6630//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6630//console This message is automatically generated.
        Hide
        Chris Nauroth added a comment -

        I committed this to trunk, branch-2 and branch-2.4. Arpit, thank you for the review.

        Show
        Chris Nauroth added a comment - I committed this to trunk, branch-2 and branch-2.4. Arpit, thank you for the review.
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #5483 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5483/)
        HDFS-6208. DataNode caching can leak file descriptors. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1586154)

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetCache.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5483 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5483/ ) HDFS-6208 . DataNode caching can leak file descriptors. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1586154 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetCache.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #535 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/535/)
        HDFS-6208. DataNode caching can leak file descriptors. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1586154)

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetCache.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #535 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/535/ ) HDFS-6208 . DataNode caching can leak file descriptors. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1586154 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetCache.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #1753 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1753/)
        HDFS-6208. DataNode caching can leak file descriptors. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1586154)

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetCache.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1753 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1753/ ) HDFS-6208 . DataNode caching can leak file descriptors. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1586154 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetCache.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #1728 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1728/)
        HDFS-6208. DataNode caching can leak file descriptors. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1586154)

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetCache.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1728 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1728/ ) HDFS-6208 . DataNode caching can leak file descriptors. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1586154 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetCache.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
        Hide
        Colin Patrick McCabe added a comment -

        thanks for this fix, Chris

        Show
        Colin Patrick McCabe added a comment - thanks for this fix, Chris

          People

          • Assignee:
            Chris Nauroth
            Reporter:
            Chris Nauroth
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development