Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-6527

Edit log corruption due to defered INode removal

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.5.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      We have seen a SBN crashing with the following error:

      [Edit log tailer] ERROR namenode.FSEditLogLoader:
      Encountered exception on operation AddBlockOp
      [path=/xxx,
      penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
      RpcCallId=-2]
      java.io.FileNotFoundException: File does not exist: /xxx

      This was caused by the deferred removal of deleted inodes from the inode map. Since getAdditionalBlock() acquires FSN read lock and then write lock, a deletion can happen in between. Because of deferred inode removal outside FSN write lock, getAdditionalBlock() can get the deleted inode from the inode map with FSN write lock held. This allow addition of a block to a deleted file.

      As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
      OP_ADD_BLOCK. This cannot be replayed by NN, so NN doesn't start up or SBN crashes.

      1. HDFS-6527.branch-2.4.patch
        2 kB
        Kihwal Lee
      2. HDFS-6527.trunk.patch
        2 kB
        Kihwal Lee
      3. HDFS-6527.v2.patch
        6 kB
        Kihwal Lee
      4. HDFS-6527.v3.patch
        7 kB
        Kihwal Lee
      5. HDFS-6527.v4.patch
        6 kB
        Kihwal Lee
      6. HDFS-6527.v5.patch
        7 kB
        Jing Zhao
      7. HDFS-6527-addendum-test.patch
        6 kB
        Aaron T. Myers

        Activity

        Hide
        Kihwal Lee added a comment -

        This bug has been there since 2.1.0-beta. It involves two client threads creating and deleting the same file.

        Show
        Kihwal Lee added a comment - This bug has been there since 2.1.0-beta. It involves two client threads creating and deleting the same file.
        Hide
        Daryn Sharp added a comment -

        Is it possible to avoid having fsdir up-call to the fsn, apparently to clear leases, but rather fsdir removes the inodes from the map and fsn clears the leases?

        Show
        Daryn Sharp added a comment - Is it possible to avoid having fsdir up-call to the fsn, apparently to clear leases, but rather fsdir removes the inodes from the map and fsn clears the leases?
        Hide
        Kihwal Lee added a comment -

        Instead of moving inode removal inside lock, we could have getAdditionalBlock() to do additional check after acquiring the FSN write lock. One possibility is to have it acquire FSDirectory read lock and check the parent of the inode. If deleted, it should be null. Or we could make checkLease() do more than just checking the clientName recorded in the inode.

        Show
        Kihwal Lee added a comment - Instead of moving inode removal inside lock, we could have getAdditionalBlock() to do additional check after acquiring the FSN write lock. One possibility is to have it acquire FSDirectory read lock and check the parent of the inode. If deleted, it should be null. Or we could make checkLease() do more than just checking the clientName recorded in the inode.
        Hide
        Kihwal Lee added a comment -

        The new patch simply checks the parent of inode against null. This is done in checkLease(), which is called by getAdditionalBlock() after acquring the FSN writelock.

        Also added is a new test case that reproduces the race between delete() and getAdditionalBlock(). Without the change in checkLease(), the test case fails. Its failure means getAdditionalBlock() was successful even after delete(). This causes the problematic edit log sequence.

        Show
        Kihwal Lee added a comment - The new patch simply checks the parent of inode against null. This is done in checkLease(), which is called by getAdditionalBlock() after acquring the FSN writelock. Also added is a new test case that reproduces the race between delete() and getAdditionalBlock(). Without the change in checkLease(), the test case fails. Its failure means getAdditionalBlock() was successful even after delete(). This causes the problematic edit log sequence.
        Hide
        Jing Zhao added a comment -

        Thanks for the fix, Kihwal Lee!

        The new patch simply checks the parent of inode against null

        Currently if the inode is in a snapshot then its parent will not be set to null after deletion. In that case can we run into a scenario where a block is added to a deleted file that is in our read-only snapshot? Maybe we also need to check FileWithSnapshotFeature#isCurrentFileDeleted?

        Show
        Jing Zhao added a comment - Thanks for the fix, Kihwal Lee ! The new patch simply checks the parent of inode against null Currently if the inode is in a snapshot then its parent will not be set to null after deletion. In that case can we run into a scenario where a block is added to a deleted file that is in our read-only snapshot? Maybe we also need to check FileWithSnapshotFeature#isCurrentFileDeleted?
        Hide
        Kihwal Lee added a comment -

        Thanks for the comment, Jing Zhao. We can null out the client name while deleting files. Then lease check is guaranteed to fail.

        In INodeFile#destroyAndCollectBlocks(), we can delete the client name.

             if (sf != null) {
               sf.clearDiffs();
             }
        +
        +    // Delete client name if under construction. This destroys a half of
        +    // the lease. The other half will be removed later from LeaseManager.
        +    FileUnderConstructionFeature uc = getFileUnderConstructionFeature();
        +    if (uc != null) {
        +      uc.setClientName(null);
        +    }
           }
        

        And in FSNamesystem#checkLease(), we can have the following check instead of the parent == null check.

             String clientName = file.getFileUnderConstructionFeature().getClientName();
        +    if (clientName == null) {
        +      // clientName is removed when the file is deleted.
        +      throw new FileNotFoundException(src);
        +    }
        

        This will make lease checks to fail once the "real" file is deleted, whether it is in a snapshot or not. Do you think it is reasonable?

        Show
        Kihwal Lee added a comment - Thanks for the comment, Jing Zhao . We can null out the client name while deleting files. Then lease check is guaranteed to fail. In INodeFile#destroyAndCollectBlocks() , we can delete the client name. if (sf != null ) { sf.clearDiffs(); } + + // Delete client name if under construction. This destroys a half of + // the lease. The other half will be removed later from LeaseManager. + FileUnderConstructionFeature uc = getFileUnderConstructionFeature(); + if (uc != null ) { + uc.setClientName( null ); + } } And in FSNamesystem#checkLease() , we can have the following check instead of the parent == null check. String clientName = file.getFileUnderConstructionFeature().getClientName(); + if (clientName == null ) { + // clientName is removed when the file is deleted. + throw new FileNotFoundException(src); + } This will make lease checks to fail once the "real" file is deleted, whether it is in a snapshot or not. Do you think it is reasonable?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12650347/HDFS-6527.v2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        -1 javac. The applied patch generated 1260 javac compiler warnings (more than the trunk's current 1259 warnings).

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7115//testReport/
        Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7115//artifact/trunk/patchprocess/diffJavacWarnings.txt
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7115//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650347/HDFS-6527.v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. -1 javac . The applied patch generated 1260 javac compiler warnings (more than the trunk's current 1259 warnings). +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7115//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7115//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7115//console This message is automatically generated.
        Hide
        Kihwal Lee added a comment -

        The new v3 patch implements what I suggested above. It nulls out the client name field. Any further client actions against the file will be rejected. Also fixed the javac warning caused by the use of the deprecated delete() method in the new test case.

        Show
        Kihwal Lee added a comment - The new v3 patch implements what I suggested above. It nulls out the client name field. Any further client actions against the file will be rejected. Also fixed the javac warning caused by the use of the deprecated delete() method in the new test case.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12650386/HDFS-6527.v3.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7118//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7118//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650386/HDFS-6527.v3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7118//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7118//console This message is automatically generated.
        Hide
        Jing Zhao added a comment -

        The v3 may not work when the file is contained in a snapshot. The new unit test can fail if we create a snapshot on root after the file creation:

              FSDataOutputStream out = null;
              out = fs.create(filePath);
              SnapshotTestHelper.createSnapshot(fs, new Path("/"), "s1");
              Thread deleteThread = new DeleteThread(fs, filePath, true);
        

        Instead of the changes made in v3 patch, I guess the v2 patch may work with the following change:

        @@ -3018,6 +3036,13 @@ private INodeFile checkLease(String src, String holder, INode inode,
                   + (lease != null ? lease.toString()
                       : "Holder " + holder + " does not have any open files."));
             }
        +    // If parent is not there or we mark the file as deleted in its snapshot
        +    // feature, it must have been deleted.
        +    if (file.getParent() == null
        +        || (file.isWithSnapshot() && file.getFileWithSnapshotFeature()
        +            .isCurrentFileDeleted())) {
        +      throw new FileNotFoundException(src);
        +    }
             String clientName = file.getFileUnderConstructionFeature().getClientName();
             if (holder != null && !clientName.equals(holder)) {
               throw new LeaseExpiredException("Lease mismatch on " + ident +
        
        Show
        Jing Zhao added a comment - The v3 may not work when the file is contained in a snapshot. The new unit test can fail if we create a snapshot on root after the file creation: FSDataOutputStream out = null ; out = fs.create(filePath); SnapshotTestHelper.createSnapshot(fs, new Path( "/" ), "s1" ); Thread deleteThread = new DeleteThread(fs, filePath, true ); Instead of the changes made in v3 patch, I guess the v2 patch may work with the following change: @@ -3018,6 +3036,13 @@ private INodeFile checkLease( String src, String holder, INode inode, + (lease != null ? lease.toString() : "Holder " + holder + " does not have any open files." )); } + // If parent is not there or we mark the file as deleted in its snapshot + // feature, it must have been deleted. + if (file.getParent() == null + || (file.isWithSnapshot() && file.getFileWithSnapshotFeature() + .isCurrentFileDeleted())) { + throw new FileNotFoundException(src); + } String clientName = file.getFileUnderConstructionFeature().getClientName(); if (holder != null && !clientName.equals(holder)) { throw new LeaseExpiredException( "Lease mismatch on " + ident +
        Hide
        Jing Zhao added a comment -

        Besides, is it possible that we can hide the sleepForTesting part inside a customized BlockPlacementPolicy? The customized BlockPlacementPolicy implementation uses the same policy as BlockPlacementPolicyDefault, but make the thread sleep 1s before returning. In this way we do not need to inject testing code into FSNamesystem.

        Show
        Jing Zhao added a comment - Besides, is it possible that we can hide the sleepForTesting part inside a customized BlockPlacementPolicy? The customized BlockPlacementPolicy implementation uses the same policy as BlockPlacementPolicyDefault, but make the thread sleep 1s before returning. In this way we do not need to inject testing code into FSNamesystem.
        Hide
        Kihwal Lee added a comment -

        The v4 patch does what you suggested. Regarding the test code in FSNamesystem, delete() also needs a delay. We already have fault injection in various critical parts of the system.

        Show
        Kihwal Lee added a comment - The v4 patch does what you suggested. Regarding the test code in FSNamesystem , delete() also needs a delay. We already have fault injection in various critical parts of the system.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12650582/HDFS-6527.v4.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7133//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7133//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650582/HDFS-6527.v4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7133//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7133//console This message is automatically generated.
        Hide
        Jing Zhao added a comment -

        Thanks Kihwal! The v4 patch looks good to me. But I guess the unit test now cannot cover the non-snapshot case since the inode will not be removed from the inodemap if it is still contained in a snapshot.

        So based on your v4 patch I added a new unit test to cover both scenarios. Also I use a customized block placement policy and use whitebox to add the deleted inode back to the inodemap so as to remove the dependency of the fault injection code.

        Show
        Jing Zhao added a comment - Thanks Kihwal! The v4 patch looks good to me. But I guess the unit test now cannot cover the non-snapshot case since the inode will not be removed from the inodemap if it is still contained in a snapshot. So based on your v4 patch I added a new unit test to cover both scenarios. Also I use a customized block placement policy and use whitebox to add the deleted inode back to the inodemap so as to remove the dependency of the fault injection code.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12650667/HDFS-6527.v5.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.server.datanode.TestBPOfferService
        org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7141//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7141//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650667/HDFS-6527.v5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestBPOfferService org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7141//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7141//console This message is automatically generated.
        Hide
        Kihwal Lee added a comment -

        +1 for the test improvement.

        Show
        Kihwal Lee added a comment - +1 for the test improvement.
        Hide
        Jing Zhao added a comment -

        Thanks for reviewing the test change, Kihwal Lee! I will commit this late today if no further comments.

        Show
        Jing Zhao added a comment - Thanks for reviewing the test change, Kihwal Lee ! I will commit this late today if no further comments.
        Hide
        Kihwal Lee added a comment -

        Changing the target version from 2.4.1 to 2.5.0 since 2.4.1 is already cut.

        Show
        Kihwal Lee added a comment - Changing the target version from 2.4.1 to 2.5.0 since 2.4.1 is already cut.
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #5720 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5720/)
        HDFS-6527. Edit log corruption due to defered INode removal. Contributed by Kihwal Lee and Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603340)

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #5720 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5720/ ) HDFS-6527 . Edit log corruption due to defered INode removal. Contributed by Kihwal Lee and Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603340 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Hide
        Jing Zhao added a comment -

        Thanks for the fix, Kihwal Lee! I've committed this to trunk and branch-2.

        Show
        Jing Zhao added a comment - Thanks for the fix, Kihwal Lee ! I've committed this to trunk and branch-2.
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #587 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/587/)
        HDFS-6527. Edit log corruption due to defered INode removal. Contributed by Kihwal Lee and Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603340)

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #587 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/587/ ) HDFS-6527 . Edit log corruption due to defered INode removal. Contributed by Kihwal Lee and Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603340 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #1778 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1778/)
        HDFS-6527. Edit log corruption due to defered INode removal. Contributed by Kihwal Lee and Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603340)

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1778 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1778/ ) HDFS-6527 . Edit log corruption due to defered INode removal. Contributed by Kihwal Lee and Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603340 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #1805 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1805/)
        HDFS-6527. Edit log corruption due to defered INode removal. Contributed by Kihwal Lee and Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603340)

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1805 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1805/ ) HDFS-6527 . Edit log corruption due to defered INode removal. Contributed by Kihwal Lee and Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603340 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Hide
        Arun C Murthy added a comment -

        Changed fix version to 2.4.1 since it got merged in for rc1.

        Show
        Arun C Murthy added a comment - Changed fix version to 2.4.1 since it got merged in for rc1.
        Hide
        Siqi Li added a comment -

        When running unit tests in this patch v5, I get the following errors

        2014-06-23 13:36:09,516 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(873)) - Failed to close file /testDeleteAddBlockRace
        org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /testDeleteAddBlockRace: File does not exist. Holder DFSClient_NONMAPREDUCE_1652233532_1 does not have any open files.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2941)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2762)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:585)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:394)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1547)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2008)

        at org.apache.hadoop.ipc.Client.call(Client.java:1410)
        at org.apache.hadoop.ipc.Client.call(Client.java:1363)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:188)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
        at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1443)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1265)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:529)

        Show
        Siqi Li added a comment - When running unit tests in this patch v5, I get the following errors 2014-06-23 13:36:09,516 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(873)) - Failed to close file /testDeleteAddBlockRace org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /testDeleteAddBlockRace: File does not exist. Holder DFSClient_NONMAPREDUCE_1652233532_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2941) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2762) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:585) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:394) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1547) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2008) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy17.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:188) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy17.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1443) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1265) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:529)
        Hide
        Jing Zhao added a comment -

        Thanks for the report, Siqi Li! Actually the unit test failure can be reproduced in branch-2.4.1.

        I think the patch should not be included in branch-2.4.1. In branch-2.4.1, we have the following code in the beginning of analyzeFileState:

        final INodesInPath iip = dir.getINodesInPath4Write(src);
        final INodeFile pendingFile
            = checkLease(src, fileId, clientName, iip.getLastINode());
        

        I.e., the inode is retrieved by resolving the path. Since we remove the inode from the children list while holding the fsnamesystem lock, the race condition can actually be avoided I guess.

        HDFS-6294 changed this part to load the INode from the inode map, while in FSNamesystem#deleteInternal, the inode is removed from the inode map without holding fsnamesystem lock. I guess this is the cause of the issue, and the issue is only in 2.5 and trunk.

        Show
        Jing Zhao added a comment - Thanks for the report, Siqi Li ! Actually the unit test failure can be reproduced in branch-2.4.1. I think the patch should not be included in branch-2.4.1. In branch-2.4.1, we have the following code in the beginning of analyzeFileState: final INodesInPath iip = dir.getINodesInPath4Write(src); final INodeFile pendingFile = checkLease(src, fileId, clientName, iip.getLastINode()); I.e., the inode is retrieved by resolving the path. Since we remove the inode from the children list while holding the fsnamesystem lock, the race condition can actually be avoided I guess. HDFS-6294 changed this part to load the INode from the inode map, while in FSNamesystem#deleteInternal, the inode is removed from the inode map without holding fsnamesystem lock. I guess this is the cause of the issue, and the issue is only in 2.5 and trunk.
        Hide
        Jing Zhao added a comment -

        Reopen the jira and wait for Kihwal Lee to comment.

        Show
        Jing Zhao added a comment - Reopen the jira and wait for Kihwal Lee to comment.
        Hide
        Kihwal Lee added a comment -

        Jing Zhao You are right. Since it reresolves inside the write lock, it will detect the deletion. I will revert it from 2.4.1.

        Show
        Kihwal Lee added a comment - Jing Zhao You are right. Since it reresolves inside the write lock, it will detect the deletion. I will revert it from 2.4.1.
        Hide
        Kihwal Lee added a comment -

        Reverted it from branch-2.4.1 and also updated the release note.

        Show
        Kihwal Lee added a comment - Reverted it from branch-2.4.1 and also updated the release note.
        Hide
        Aaron T. Myers added a comment -

        Hey folks,

        I've been looking into this a bit and have come to the conclusion that we should actually include this fix in 2.4.1. The reason is that though the original addBlock scenario sort of incidentally can't happen in 2.4.0, I believe that a similar scenario can happen with a race between close and delete.

        Even though close doesn't do any sort of dropping of its lock during the duration of its RPC, the entirety of a single close operation can begin and end successfully between when the delete edit log op is logged, and when the INode is later removed in the delete call. See the attached additional test case which demonstrates the issue.

        This will result in a similarly invalid edit log op sequence wherein you'll see an OP_ADD, OP_DELETE, and then OP_CLOSE, which can't be successfully replayed by the NN since the OP_CLOSE will get a FileNotFound. I've seen this happen on two clusters now.

        Kihwal/Jing - if you agree with my analysis, let's reopen this JIRA so this fix can be included in 2.4.1, though without the addBlock test case, and with only the close test case.

        Show
        Aaron T. Myers added a comment - Hey folks, I've been looking into this a bit and have come to the conclusion that we should actually include this fix in 2.4.1. The reason is that though the original addBlock scenario sort of incidentally can't happen in 2.4.0, I believe that a similar scenario can happen with a race between close and delete . Even though close doesn't do any sort of dropping of its lock during the duration of its RPC, the entirety of a single close operation can begin and end successfully between when the delete edit log op is logged, and when the INode is later removed in the delete call. See the attached additional test case which demonstrates the issue. This will result in a similarly invalid edit log op sequence wherein you'll see an OP_ADD , OP_DELETE , and then OP_CLOSE , which can't be successfully replayed by the NN since the OP_CLOSE will get a FileNotFound . I've seen this happen on two clusters now. Kihwal/Jing - if you agree with my analysis, let's reopen this JIRA so this fix can be included in 2.4.1, though without the addBlock test case, and with only the close test case.
        Hide
        Jing Zhao added a comment -

        Thanks Aaron T. Myers! The analysis makes sense to me. Let's reopen the jira and fix it in 2.4.1.

        In the meanwhile, in FSNamesystem#delete, maybe we should move "dir.removeFromInodeMap(removedINodes)" into the fsnamesystem write lock? I guess this will prevent similar issue in the future.

        Show
        Jing Zhao added a comment - Thanks Aaron T. Myers ! The analysis makes sense to me. Let's reopen the jira and fix it in 2.4.1. In the meanwhile, in FSNamesystem#delete, maybe we should move "dir.removeFromInodeMap(removedINodes)" into the fsnamesystem write lock? I guess this will prevent similar issue in the future.
        Hide
        Aaron T. Myers added a comment -

        In the meanwhile, in FSNamesystem#delete, maybe we should move "dir.removeFromInodeMap(removedINodes)" into the fsnamesystem write lock? I guess this will prevent similar issue in the future.

        This currently is done under the write lock, but it's done after having dropped the write lock briefly, so presumably you're proposing to make all of the contents of deleteInternal happen under a single write lock? That seems reasonable to me, but to be honest I'm not sure of the history of why this deferred inode removal was done in the first place.

        Show
        Aaron T. Myers added a comment - In the meanwhile, in FSNamesystem#delete, maybe we should move "dir.removeFromInodeMap(removedINodes)" into the fsnamesystem write lock? I guess this will prevent similar issue in the future. This currently is done under the write lock, but it's done after having dropped the write lock briefly, so presumably you're proposing to make all of the contents of deleteInternal happen under a single write lock? That seems reasonable to me, but to be honest I'm not sure of the history of why this deferred inode removal was done in the first place.
        Hide
        Jing Zhao added a comment -

        Yeah, let's merge the fix back to 2.4.1 with unit test changes first. We can revisit the deferred removal in a separate jira.

        Show
        Jing Zhao added a comment - Yeah, let's merge the fix back to 2.4.1 with unit test changes first. We can revisit the deferred removal in a separate jira.
        Hide
        Jing Zhao added a comment -

        Aaron T. Myers, do you already have a updated patch for 2.4.1? Otherwise I will try it.

        Show
        Jing Zhao added a comment - Aaron T. Myers , do you already have a updated patch for 2.4.1? Otherwise I will try it.
        Hide
        Aaron T. Myers added a comment -

        Hi Jing, I haven't currently create that patch, no. Please do feel free to go ahead and do that. Thanks a lot.

        Show
        Aaron T. Myers added a comment - Hi Jing, I haven't currently create that patch, no. Please do feel free to go ahead and do that. Thanks a lot.
        Hide
        Jing Zhao added a comment -

        Copied from release vote mailing thread:

        That's fine by me. Like I said, assuming that rc1 does indeed include the fix in HDFS-6527, and not the revert, then rc1 should be functionally correct. What's in branch-2.4.1 doesn't currently match what's in this RC, but if that doesn't bother anyone else then I won't lose any sleep over it.
        --
        Aaron T. Myers
        Software Engineer, Cloudera
        
        > On Jun 27, 2014, at 3:04 PM, "Arun C. Murthy" <acm@hortonworks.com> wrote:
        >
        > Aaron,
        >
        > Since the amend was just to the test, I'll keep this RC as-is.
        >
        > I'll also comment on jira.
        >
        > thanks,
        > Arun
        

        In this way, I plan not to change branch-2.4.1 and leave it as it is. What do you think Aaron T. Myers and Arun C Murthy?

        Show
        Jing Zhao added a comment - Copied from release vote mailing thread: That's fine by me. Like I said, assuming that rc1 does indeed include the fix in HDFS-6527, and not the revert, then rc1 should be functionally correct. What's in branch-2.4.1 doesn't currently match what's in this RC, but if that doesn't bother anyone else then I won't lose any sleep over it. -- Aaron T. Myers Software Engineer, Cloudera > On Jun 27, 2014, at 3:04 PM, "Arun C. Murthy" <acm@hortonworks.com> wrote: > > Aaron, > > Since the amend was just to the test, I'll keep this RC as-is. > > I'll also comment on jira. > > thanks, > Arun In this way, I plan not to change branch-2.4.1 and leave it as it is. What do you think Aaron T. Myers and Arun C Murthy ?
        Hide
        Aaron T. Myers added a comment -

        I'd say we should still probably do it, but put it only in branch-2.4, not branch-2.4.1, in case we end up later doing a 2.4.2 release.

        Show
        Aaron T. Myers added a comment - I'd say we should still probably do it, but put it only in branch-2.4, not branch-2.4.1, in case we end up later doing a 2.4.2 release.
        Hide
        Jing Zhao added a comment -

        Thanks for the quick reply, Aaron T. Myers. I will merge it to branch-2.4 then.

        Show
        Jing Zhao added a comment - Thanks for the quick reply, Aaron T. Myers . I will merge it to branch-2.4 then.
        Hide
        Jing Zhao added a comment -

        Actually the patch has already been merged into 2.4. I will leave it there as it is now.

        Show
        Jing Zhao added a comment - Actually the patch has already been merged into 2.4. I will leave it there as it is now.

          People

          • Assignee:
            Kihwal Lee
            Reporter:
            Kihwal Lee
          • Votes:
            0 Vote for this issue
            Watchers:
            21 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development