Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10763

Open files can leak permanently due to inconsistent lease update

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.7.3, 2.6.4
    • Fix Version/s: 2.6.5, 2.7.4, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      This can heppen during commitBlockSynchronization() or a client gives up on closing a file after retries.
      From finalizeINodeFileUnderConstruction(), the lease is removed first and then the inode is turned into the closed state. But if any block is not in COMPLETE state,
      INodeFile#assertAllBlocksComplete() will throw an exception. This will cause the lease is removed from the lease manager, but not from the inode. Since the lease manager does not have a lease for the file, no lease recovery will happen for this file. Moreover, this broken state is persisted and reconstructed through saving and loading of fsimage. Since no replication is scheduled for the blocks for the file, this can cause a data loss and also block decommissioning of datanode.

      The lease cannot be manually recovered either. It fails with

      ...AlreadyBeingCreatedException): Failed to RECOVER_LEASE /xyz/xyz for user1 on
       0.0.0.1 because the file is under construction but no leases found.
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2950)
      ...
      

      When a client retries close(), the same inconsistent state is created, but it can work in the next time since checkLease() only looks at the inode, not the lease manager in this case. The close behavior is different if HDFS-8999 is activated by setting dfs.namenode.file.close.num-committed-allowed to 1 (unlikely) or 2 (never).

      In principle, the under-construction feature of an inode and the lease in the lease manager should never go out of sync. The fix involves two parts.
      1) Prevent inconsistent lease updates. We can achieve this by calling removeLease() after checking the block state.
      2) Avoid reconstructing inconsistent lease states from a fsimage. 1) alone does not correct the existing inconsistencies surviving through fsimages. This can be done during fsimage loading time by making sure a corresponding lease exists for each inode that are with the underconstruction feature.

      1. HDFS-10763.br27.patch
        6 kB
        Kihwal Lee
      2. HDFS-10763.branch-2.7.supplement.patch
        1 kB
        Kihwal Lee
      3. HDFS-10763.branch-2.7.v2.patch
        6 kB
        Kihwal Lee
      4. HDFS-10763.patch
        6 kB
        Kihwal Lee

        Activity

        Hide
        kihwal Kihwal Lee added a comment -

        Regarding 2), trunk through branch-2 (2.8) can be fixed by simply adding lease while loading inodes. After this the files-under-construction section won't be much of use. We can probably make NN not save the section starting 2.8. The loading should be present for the compatibility. For 2.7 and 2.6, the leases are still path based, so leases cannot be added until the inode directory section is loaded. A simple fix for 2.6/2.7 is to build a list of inodes that are under construction while loading the inode section and then add leases later.

        Show
        kihwal Kihwal Lee added a comment - Regarding 2), trunk through branch-2 (2.8) can be fixed by simply adding lease while loading inodes. After this the files-under-construction section won't be much of use. We can probably make NN not save the section starting 2.8. The loading should be present for the compatibility. For 2.7 and 2.6, the leases are still path based, so leases cannot be added until the inode directory section is loaded. A simple fix for 2.6/2.7 is to build a list of inodes that are under construction while loading the inode section and then add leases later.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 14s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 7m 7s trunk passed
        +1 compile 0m 47s trunk passed
        +1 checkstyle 0m 28s trunk passed
        +1 mvnsite 0m 55s trunk passed
        +1 mvneclipse 0m 12s trunk passed
        +1 findbugs 1m 44s trunk passed
        +1 javadoc 0m 58s trunk passed
        +1 mvninstall 0m 50s the patch passed
        +1 compile 0m 45s the patch passed
        +1 javac 0m 45s the patch passed
        -0 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 204 unchanged - 1 fixed = 205 total (was 205)
        +1 mvnsite 0m 54s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 49s the patch passed
        +1 javadoc 0m 54s the patch passed
        +1 unit 62m 46s hadoop-hdfs in the patch passed.
        +1 asflicense 0m 21s The patch does not generate ASF License warnings.
        82m 35s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12823756/HDFS-10763.patch
        JIRA Issue HDFS-10763
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 7c25b8e39cc3 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / bed69d1
        Default Java 1.8.0_101
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16425/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16425/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16425/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 7s trunk passed +1 compile 0m 47s trunk passed +1 checkstyle 0m 28s trunk passed +1 mvnsite 0m 55s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 44s trunk passed +1 javadoc 0m 58s trunk passed +1 mvninstall 0m 50s the patch passed +1 compile 0m 45s the patch passed +1 javac 0m 45s the patch passed -0 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 204 unchanged - 1 fixed = 205 total (was 205) +1 mvnsite 0m 54s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 49s the patch passed +1 javadoc 0m 54s the patch passed +1 unit 62m 46s hadoop-hdfs in the patch passed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 82m 35s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12823756/HDFS-10763.patch JIRA Issue HDFS-10763 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 7c25b8e39cc3 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / bed69d1 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16425/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16425/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16425/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        daryn Daryn Sharp added a comment -

        +1

        Show
        daryn Daryn Sharp added a comment - +1
        Hide
        kihwal Kihwal Lee added a comment -

        Thanks for the review Daryn. I've committed this to trunk through branch-2.7. Chris Trezzo, do you want this in 2.6.?

        Show
        kihwal Kihwal Lee added a comment - Thanks for the review Daryn. I've committed this to trunk through branch-2.7. Chris Trezzo , do you want this in 2.6.?
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10276 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10276/)
        HDFS-10763. Open files can leak permanently due to inconsistent lease (kihwal: rev 864f878d5912c82f3204f1582cfb7eb7c9f1a1da)

        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10276 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10276/ ) HDFS-10763 . Open files can leak permanently due to inconsistent lease (kihwal: rev 864f878d5912c82f3204f1582cfb7eb7c9f1a1da) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        Hide
        kihwal Kihwal Lee added a comment -

        It seems to have introduced a bug to branch-2.7 when there are under-construction files in a snapshot.
        I will fix it by tomorrow. If the fix is simple, I will post a supplemental patch. If not, will revert and submit a new patch for branch-2.7.

        Show
        kihwal Kihwal Lee added a comment - It seems to have introduced a bug to branch-2.7 when there are under-construction files in a snapshot. I will fix it by tomorrow. If the fix is simple, I will post a supplemental patch. If not, will revert and submit a new patch for branch-2.7.
        Hide
        kihwal Kihwal Lee added a comment -

        Reopening to fix branch-2.7. Apparently we can't simply eliminate open file leak. A deleted file in a snapshot is supposed to be leaked. I am thinking this is a bug or design flaw, but it is a topic of separate discussion, which I will initiate soon.

        Trunk through branch-2.8 are fine, as lease is inode ID based. Also the sanity check in the lease manager "takes care of" leases on deleted files in a snapshot. Their leaked state is restored.

        So, this jira will only fix the uc inode leaks for existing files, not deleted files in snapshots. Fixing latter can be done only after the snapshot feature is fixed.

        I will restore the prior snapshot-related behavior to branch-2.7.

        Show
        kihwal Kihwal Lee added a comment - Reopening to fix branch-2.7. Apparently we can't simply eliminate open file leak. A deleted file in a snapshot is supposed to be leaked. I am thinking this is a bug or design flaw, but it is a topic of separate discussion, which I will initiate soon. Trunk through branch-2.8 are fine, as lease is inode ID based. Also the sanity check in the lease manager "takes care of" leases on deleted files in a snapshot. Their leaked state is restored. So, this jira will only fix the uc inode leaks for existing files, not deleted files in snapshots. Fixing latter can be done only after the snapshot feature is fixed. I will restore the prior snapshot-related behavior to branch-2.7.
        Hide
        kihwal Kihwal Lee added a comment -

        Attaching a supplemental patch for branch-2.7. This skips restoration of lease for deleted files that are still under construction in a snapshot, just like before. Again, this behavior did not change with the initial patch for trunk through branch-2.8. It only affected branch-2.7 as the lease is path based.

        Show
        kihwal Kihwal Lee added a comment - Attaching a supplemental patch for branch-2.7. This skips restoration of lease for deleted files that are still under construction in a snapshot, just like before. Again, this behavior did not change with the initial patch for trunk through branch-2.8. It only affected branch-2.7 as the lease is path based.
        Hide
        kihwal Kihwal Lee added a comment -

        As pointed out by Zhe Zhang, TestOpenFilesWithSnapshot fails in branch-2.7 without the supplemental patch.
        It also occasionally fails waiting for NN to exit safe mode even without any part of this jira. I have a suspicion that it has something to do with uc block counting for snapshot case. I will link relevant jiras when they are found/filed.

        Show
        kihwal Kihwal Lee added a comment - As pointed out by Zhe Zhang , TestOpenFilesWithSnapshot fails in branch-2.7 without the supplemental patch. It also occasionally fails waiting for NN to exit safe mode even without any part of this jira. I have a suspicion that it has something to do with uc block counting for snapshot case. I will link relevant jiras when they are found/filed.
        Hide
        kihwal Kihwal Lee added a comment -

        Removed an aborted jenkins run.

        Show
        kihwal Kihwal Lee added a comment - Removed an aborted jenkins run.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 21s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 6m 4s branch-2.7 passed
        +1 compile 1m 8s branch-2.7 passed with JDK v1.8.0_101
        +1 compile 1m 11s branch-2.7 passed with JDK v1.7.0_101
        +1 checkstyle 0m 26s branch-2.7 passed
        +1 mvnsite 1m 4s branch-2.7 passed
        +1 mvneclipse 0m 15s branch-2.7 passed
        +1 findbugs 2m 58s branch-2.7 passed
        +1 javadoc 0m 59s branch-2.7 passed with JDK v1.8.0_101
        +1 javadoc 1m 40s branch-2.7 passed with JDK v1.7.0_101
        +1 mvninstall 0m 51s the patch passed
        +1 compile 0m 57s the patch passed with JDK v1.8.0_101
        +1 javac 0m 57s the patch passed
        +1 compile 0m 59s the patch passed with JDK v1.7.0_101
        +1 javac 0m 59s the patch passed
        +1 checkstyle 0m 21s the patch passed
        +1 mvnsite 0m 56s the patch passed
        +1 mvneclipse 0m 12s the patch passed
        -1 whitespace 0m 1s The patch has 1578 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
        -1 whitespace 0m 42s The patch 78 line(s) with tabs.
        +1 findbugs 3m 13s the patch passed
        +1 javadoc 0m 58s the patch passed with JDK v1.8.0_101
        +1 javadoc 1m 41s the patch passed with JDK v1.7.0_101
        -1 unit 63m 40s hadoop-hdfs in the patch failed with JDK v1.7.0_101.
        -1 asflicense 0m 24s The patch generated 3 ASF License warnings.
        155m 3s



        Reason Tests
        JDK v1.8.0_101 Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
          hadoop.hdfs.server.blockmanagement.TestBlockManager
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
          hadoop.hdfs.server.datanode.TestBlockReplacement
          hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
          hadoop.hdfs.server.namenode.TestFileTruncate
        JDK v1.7.0_101 Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
          hadoop.hdfs.server.balancer.TestBalancer
          hadoop.hdfs.server.datanode.TestBlockReplacement
          hadoop.hdfs.server.namenode.TestFileTruncate



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:c420dfe
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824195/HDFS-10763.branch-2.7.supplement.patch
        JIRA Issue HDFS-10763
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux fccfd73e7905 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision branch-2.7 / 040a1b7
        Default Java 1.7.0_101
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101
        findbugs v3.0.0
        whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/16454/artifact/patchprocess/whitespace-eol.txt
        whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/16454/artifact/patchprocess/whitespace-tabs.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/16454/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_101.txt
        JDK v1.7.0_101 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16454/testReport/
        asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/16454/artifact/patchprocess/patch-asflicense-problems.txt
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16454/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 21s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 6m 4s branch-2.7 passed +1 compile 1m 8s branch-2.7 passed with JDK v1.8.0_101 +1 compile 1m 11s branch-2.7 passed with JDK v1.7.0_101 +1 checkstyle 0m 26s branch-2.7 passed +1 mvnsite 1m 4s branch-2.7 passed +1 mvneclipse 0m 15s branch-2.7 passed +1 findbugs 2m 58s branch-2.7 passed +1 javadoc 0m 59s branch-2.7 passed with JDK v1.8.0_101 +1 javadoc 1m 40s branch-2.7 passed with JDK v1.7.0_101 +1 mvninstall 0m 51s the patch passed +1 compile 0m 57s the patch passed with JDK v1.8.0_101 +1 javac 0m 57s the patch passed +1 compile 0m 59s the patch passed with JDK v1.7.0_101 +1 javac 0m 59s the patch passed +1 checkstyle 0m 21s the patch passed +1 mvnsite 0m 56s the patch passed +1 mvneclipse 0m 12s the patch passed -1 whitespace 0m 1s The patch has 1578 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply -1 whitespace 0m 42s The patch 78 line(s) with tabs. +1 findbugs 3m 13s the patch passed +1 javadoc 0m 58s the patch passed with JDK v1.8.0_101 +1 javadoc 1m 41s the patch passed with JDK v1.7.0_101 -1 unit 63m 40s hadoop-hdfs in the patch failed with JDK v1.7.0_101. -1 asflicense 0m 24s The patch generated 3 ASF License warnings. 155m 3s Reason Tests JDK v1.8.0_101 Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots   hadoop.hdfs.server.blockmanagement.TestBlockManager   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.server.datanode.TestBlockReplacement   hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes   hadoop.hdfs.server.namenode.TestFileTruncate JDK v1.7.0_101 Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.server.datanode.TestBlockReplacement   hadoop.hdfs.server.namenode.TestFileTruncate Subsystem Report/Notes Docker Image:yetus/hadoop:c420dfe JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824195/HDFS-10763.branch-2.7.supplement.patch JIRA Issue HDFS-10763 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux fccfd73e7905 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.7 / 040a1b7 Default Java 1.7.0_101 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 findbugs v3.0.0 whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/16454/artifact/patchprocess/whitespace-eol.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/16454/artifact/patchprocess/whitespace-tabs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16454/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_101.txt JDK v1.7.0_101 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16454/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/16454/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16454/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        kihwal Kihwal Lee added a comment -

        Going through the test failures. TestRenameWithSnapshots failed, but with OOM(heap). I ran the whole suite a few times with no issue.

        -------------------------------------------------------
         T E S T S
        -------------------------------------------------------
        OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0
        Running org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
        Tests run: 36, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 201.761 sec - in org.apache.hadoop.hdfs.server.namenode.snapshot
          .TestRenameWithSnapshots
        
        Results :
        
        Tests run: 36, Failures: 0, Errors: 0, Skipped: 0
        
        Show
        kihwal Kihwal Lee added a comment - Going through the test failures. TestRenameWithSnapshots failed, but with OOM(heap). I ran the whole suite a few times with no issue. ------------------------------------------------------- T E S T S ------------------------------------------------------- OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots Tests run: 36, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 201.761 sec - in org.apache.hadoop.hdfs.server.namenode.snapshot .TestRenameWithSnapshots Results : Tests run: 36, Failures: 0, Errors: 0, Skipped: 0
        Hide
        kihwal Kihwal Lee added a comment -

        Other tests run fine, except TestDataNodeVolumeFailure. But it also fails without all changes from this jira.

        Show
        kihwal Kihwal Lee added a comment - Other tests run fine, except TestDataNodeVolumeFailure . But it also fails without all changes from this jira.
        Hide
        daryn Daryn Sharp added a comment -

        Minor comment is that the full path is being built twice. I'd change this:

        if (!path.startsWith("/")) {
          continue;
        }
        fsn.leaseManager.addLease(uc.getClientName(), file.getFullPathName())
        

        to this:

        if (path.startsWith("/")) {
          fsn.leaseManager.addLease(uc.getClientName(), path);
        }
        

        Otherwise +1.

        Show
        daryn Daryn Sharp added a comment - Minor comment is that the full path is being built twice. I'd change this: if (!path.startsWith( "/" )) { continue ; } fsn.leaseManager.addLease(uc.getClientName(), file.getFullPathName()) to this: if (path.startsWith( "/" )) { fsn.leaseManager.addLease(uc.getClientName(), path); } Otherwise +1.
        Hide
        kihwal Kihwal Lee added a comment -

        Ouch. I meant to reuse path, but apparently I didn't.

        Show
        kihwal Kihwal Lee added a comment - Ouch. I meant to reuse path , but apparently I didn't.
        Hide
        kihwal Kihwal Lee added a comment -

        Reverted the original commit from branch-2.7.
        Attaching a new patch that includes everything + addressing the review comment.

        Show
        kihwal Kihwal Lee added a comment - Reverted the original commit from branch-2.7. Attaching a new patch that includes everything + addressing the review comment.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 14s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 6m 4s branch-2.7 passed
        +1 compile 1m 2s branch-2.7 passed with JDK v1.8.0_101
        +1 compile 1m 2s branch-2.7 passed with JDK v1.7.0_101
        +1 checkstyle 0m 28s branch-2.7 passed
        +1 mvnsite 0m 59s branch-2.7 passed
        +1 mvneclipse 0m 15s branch-2.7 passed
        +1 findbugs 2m 57s branch-2.7 passed
        +1 javadoc 0m 59s branch-2.7 passed with JDK v1.8.0_101
        +1 javadoc 1m 44s branch-2.7 passed with JDK v1.7.0_101
        +1 mvninstall 0m 53s the patch passed
        +1 compile 0m 58s the patch passed with JDK v1.8.0_101
        +1 javac 0m 58s the patch passed
        +1 compile 1m 1s the patch passed with JDK v1.7.0_101
        +1 javac 1m 1s the patch passed
        -0 checkstyle 0m 26s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 392 unchanged - 1 fixed = 393 total (was 393)
        +1 mvnsite 0m 55s the patch passed
        +1 mvneclipse 0m 12s the patch passed
        -1 whitespace 0m 1s The patch has 1997 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
        -1 whitespace 0m 47s The patch 78 line(s) with tabs.
        +1 findbugs 3m 12s the patch passed
        +1 javadoc 0m 57s the patch passed with JDK v1.8.0_101
        +1 javadoc 1m 42s the patch passed with JDK v1.7.0_101
        -1 unit 42m 39s hadoop-hdfs in the patch failed with JDK v1.7.0_101.
        -1 asflicense 0m 18s The patch generated 3 ASF License warnings.
        120m 37s



        Reason Tests
        JDK v1.8.0_101 Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
        JDK v1.7.0_101 Failed junit tests hadoop.hdfs.server.balancer.TestBalancer
          hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:c420dfe
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824410/HDFS-10763.branch-2.7.v2.patch
        JIRA Issue HDFS-10763
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux da7d38af32d2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision branch-2.7 / 6593851
        Default Java 1.7.0_101
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16474/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
        whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/16474/artifact/patchprocess/whitespace-eol.txt
        whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/16474/artifact/patchprocess/whitespace-tabs.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/16474/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_101.txt
        JDK v1.7.0_101 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16474/testReport/
        asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/16474/artifact/patchprocess/patch-asflicense-problems.txt
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16474/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 4s branch-2.7 passed +1 compile 1m 2s branch-2.7 passed with JDK v1.8.0_101 +1 compile 1m 2s branch-2.7 passed with JDK v1.7.0_101 +1 checkstyle 0m 28s branch-2.7 passed +1 mvnsite 0m 59s branch-2.7 passed +1 mvneclipse 0m 15s branch-2.7 passed +1 findbugs 2m 57s branch-2.7 passed +1 javadoc 0m 59s branch-2.7 passed with JDK v1.8.0_101 +1 javadoc 1m 44s branch-2.7 passed with JDK v1.7.0_101 +1 mvninstall 0m 53s the patch passed +1 compile 0m 58s the patch passed with JDK v1.8.0_101 +1 javac 0m 58s the patch passed +1 compile 1m 1s the patch passed with JDK v1.7.0_101 +1 javac 1m 1s the patch passed -0 checkstyle 0m 26s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 392 unchanged - 1 fixed = 393 total (was 393) +1 mvnsite 0m 55s the patch passed +1 mvneclipse 0m 12s the patch passed -1 whitespace 0m 1s The patch has 1997 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply -1 whitespace 0m 47s The patch 78 line(s) with tabs. +1 findbugs 3m 12s the patch passed +1 javadoc 0m 57s the patch passed with JDK v1.8.0_101 +1 javadoc 1m 42s the patch passed with JDK v1.7.0_101 -1 unit 42m 39s hadoop-hdfs in the patch failed with JDK v1.7.0_101. -1 asflicense 0m 18s The patch generated 3 ASF License warnings. 120m 37s Reason Tests JDK v1.8.0_101 Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots JDK v1.7.0_101 Failed junit tests hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots Subsystem Report/Notes Docker Image:yetus/hadoop:c420dfe JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824410/HDFS-10763.branch-2.7.v2.patch JIRA Issue HDFS-10763 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux da7d38af32d2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.7 / 6593851 Default Java 1.7.0_101 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16474/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/16474/artifact/patchprocess/whitespace-eol.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/16474/artifact/patchprocess/whitespace-tabs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16474/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_101.txt JDK v1.7.0_101 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16474/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/16474/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16474/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        kihwal Kihwal Lee added a comment -

        The test passes reliably when run on my box.

        -------------------------------------------------------
         T E S T S
        -------------------------------------------------------
        OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0
        Running org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
        Tests run: 36, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 194.942 sec
         - in org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
        
        Results :
        
        Tests run: 36, Failures: 0, Errors: 0, Skipped: 0
        

        It failed in precommit due to jvm oom. From the log, it appears that the jvm's max heap size is smaller.

        INFO  util.GSet (LightWeightGSet.java:computeCapacity(356)) - 1.0% max memory 918.5 MB = 9.2 MB
        

        This is from my own test run:

        INFO  util.GSet (LightWeightGSet.java:computeCapacity(356)) - 1.0% max memory 3.6 GB = 36.4 MB
        

        We have this in hadoop-project/pom.xml and verified the forked test jvms are running with -Xmx4096m.

        <maven-surefire-plugin.argLine>-Xmx4096m -XX:MaxPermSize=768m -XX:+HeapDumpOnOutOfMemoryError</maven-surefire-plugin.argLine>
        

        I am guessing that the docker container had a lower memory limit. It looks like trunk tests are getting more memory.

        INFO  util.GSet (LightWeightGSet.java:computeCapacity(397)) - 1.0% max memory 1.8 GB = 18.2 MB
        
        Show
        kihwal Kihwal Lee added a comment - The test passes reliably when run on my box. ------------------------------------------------------- T E S T S ------------------------------------------------------- OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots Tests run: 36, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 194.942 sec - in org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots Results : Tests run: 36, Failures: 0, Errors: 0, Skipped: 0 It failed in precommit due to jvm oom. From the log, it appears that the jvm's max heap size is smaller. INFO util.GSet (LightWeightGSet.java:computeCapacity(356)) - 1.0% max memory 918.5 MB = 9.2 MB This is from my own test run: INFO util.GSet (LightWeightGSet.java:computeCapacity(356)) - 1.0% max memory 3.6 GB = 36.4 MB We have this in hadoop-project/pom.xml and verified the forked test jvms are running with -Xmx4096m . <maven-surefire-plugin.argLine> -Xmx4096m -XX:MaxPermSize=768m -XX:+HeapDumpOnOutOfMemoryError </maven-surefire-plugin.argLine> I am guessing that the docker container had a lower memory limit. It looks like trunk tests are getting more memory. INFO util.GSet (LightWeightGSet.java:computeCapacity(397)) - 1.0% max memory 1.8 GB = 18.2 MB
        Hide
        daryn Daryn Sharp added a comment -

        +1 the combined patch looks good. it's better than it was before

        Show
        daryn Daryn Sharp added a comment - +1 the combined patch looks good. it's better than it was before
        Hide
        kihwal Kihwal Lee added a comment -

        Committed the patch to branch-2.7.

        Show
        kihwal Kihwal Lee added a comment - Committed the patch to branch-2.7.
        Hide
        ctrezzo Chris Trezzo added a comment -

        Kihwal Lee do you think this is worth backporting to branch-2.6? It seems like the new combined patch is a clean cherry-pick to branch-2.6, but I am not too familiar with the differences in snapshot behavior between branch-2.7 and branch-2.6.

        Show
        ctrezzo Chris Trezzo added a comment - Kihwal Lee do you think this is worth backporting to branch-2.6? It seems like the new combined patch is a clean cherry-pick to branch-2.6, but I am not too familiar with the differences in snapshot behavior between branch-2.7 and branch-2.6.
        Hide
        kihwal Kihwal Lee added a comment -

        The one thing I had to do in the latest patch for branch-2.7 was to maintain whatever the snapshot code was doing against deleted files in snapshots. If it leaks UC features, it will continue to leak. If they don't, there will be no leak with the patch either. So I think it is safe for branch-2.6 as well.

        Show
        kihwal Kihwal Lee added a comment - The one thing I had to do in the latest patch for branch-2.7 was to maintain whatever the snapshot code was doing against deleted files in snapshots. If it leaks UC features, it will continue to leak. If they don't, there will be no leak with the patch either. So I think it is safe for branch-2.6 as well.
        Hide
        sjlee0 Sangjin Lee added a comment -

        Cherry-picked it to 2.6.5 (trivial).

        Show
        sjlee0 Sangjin Lee added a comment - Cherry-picked it to 2.6.5 (trivial).

          People

          • Assignee:
            kihwal Kihwal Lee
            Reporter:
            kihwal Kihwal Lee
          • Votes:
            0 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development