Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-6622

Rename and AddBlock may race and produce invalid edits

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.5.0
    • Fix Version/s: 2.5.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      While investigating HDFS-6618, we have discovered that rename happening in the middle of getAdditionalBlock() can lead to logging of invalid edit entry.

      In getAdditionalBlock() , the path is resolved once while holding the read lock and the same resolved path will be used in the edit log in the second half of the method holding the write lock. If a rename happens in between two locks, the path may no longer exist.

      When replaying the AddBlockOp, it will fail with FileNotFound.

      1. HDFS-6622.patch
        6 kB
        Kihwal Lee
      2. HDFS-6622.v2.patch
        5 kB
        Kihwal Lee

        Activity

        Karthik Kambatla (Inactive) made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1827 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1827/)
        HDFS-6622. Rename and AddBlock may race and produce invalid edits (kihwal via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1609384)

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1827 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1827/ ) HDFS-6622 . Rename and AddBlock may race and produce invalid edits (kihwal via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1609384 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #1800 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1800/)
        HDFS-6622. Rename and AddBlock may race and produce invalid edits (kihwal via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1609384)

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1800 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1800/ ) HDFS-6622 . Rename and AddBlock may race and produce invalid edits (kihwal via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1609384 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #609 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/609/)
        HDFS-6622. Rename and AddBlock may race and produce invalid edits (kihwal via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1609384)

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #609 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/609/ ) HDFS-6622 . Rename and AddBlock may race and produce invalid edits (kihwal via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1609384 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #5853 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5853/)
        HDFS-6622. Rename and AddBlock may race and produce invalid edits (kihwal via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1609384)

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #5853 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5853/ ) HDFS-6622 . Rename and AddBlock may race and produce invalid edits (kihwal via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1609384 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
        Colin Patrick McCabe made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 2.5.0 [ 12326264 ]
        Resolution Fixed [ 1 ]
        Hide
        Colin Patrick McCabe added a comment -

        I also wanted to reduce the number of times getFullPathName() is called. I will simply remove the comparison and fix the test to check the correctness of edit.

        OK. From what I can see, the path should be recomputed while under the lock (rather than simply trusting that it will stay the same since we last released the lock). That should fix things. It looks like you introduced the FileState object in order to avoid calling getFullPathName() twice while holding the lock... fair enough.

        +1

        Show
        Colin Patrick McCabe added a comment - I also wanted to reduce the number of times getFullPathName() is called. I will simply remove the comparison and fix the test to check the correctness of edit. OK. From what I can see, the path should be recomputed while under the lock (rather than simply trusting that it will stay the same since we last released the lock). That should fix things. It looks like you introduced the FileState object in order to avoid calling getFullPathName() twice while holding the lock... fair enough. +1
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12654832/HDFS-6622.v2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        -1 javac. The patch appears to cause the build to fail.

        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7305//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654832/HDFS-6622.v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7305//console This message is automatically generated.
        Kihwal Lee made changes -
        Assignee Kihwal Lee [ kihwal ]
        Kihwal Lee made changes -
        Attachment HDFS-6622.v2.patch [ 12654832 ]
        Hide
        Kihwal Lee added a comment -

        The reason I added the strict check was to prevent incorrect operations based on potentially incorrect result from getFullPathName(). If the inode's parent is not null (stale), but one of the ancestor's parent is null, it will assume that inode is directly under "/". This could happen with the delayed inode removal. But since we are going to remove inodes from inodeMap while holding FSNamesystem write lock, this should not happen. So what you suggest will be sufficient.

        I also wanted to reduce the number of times getFullPathName() is called. I will simply remove the comparison and fix the test to check the correctness of edit.

        Show
        Kihwal Lee added a comment - The reason I added the strict check was to prevent incorrect operations based on potentially incorrect result from getFullPathName(). If the inode's parent is not null (stale), but one of the ancestor's parent is null, it will assume that inode is directly under "/". This could happen with the delayed inode removal. But since we are going to remove inodes from inodeMap while holding FSNamesystem write lock, this should not happen. So what you suggest will be sufficient. I also wanted to reduce the number of times getFullPathName() is called. I will simply remove the comparison and fix the test to check the correctness of edit.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12653705/HDFS-6622.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7275//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7275//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7275//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653705/HDFS-6622.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7275//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7275//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7275//console This message is automatically generated.
        Hide
        Colin Patrick McCabe added a comment -

        I think the best solution is to recompute the path once we take the lock the second time. If we take the path directly from the inode, we know that it will be correct.

        diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        index cc522e4..808734a 100644
        --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        @@ -2805,6 +2805,7 @@ LocatedBlock getAdditionalBlock(String src, long fileId, String clientName,
               LocatedBlock[] onRetryBlock = new LocatedBlock[1];
               final INodeFile pendingFile =
                   analyzeFileState(src, fileId, clientName, previous, onRetryBlock);
        +      src = pendingFile.getFullPathName();
         
               if (onRetryBlock[0] != null) {
                 if (onRetryBlock[0].getLocations().length > 0) {
        
        Show
        Colin Patrick McCabe added a comment - I think the best solution is to recompute the path once we take the lock the second time. If we take the path directly from the inode, we know that it will be correct. diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java index cc522e4..808734a 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java @@ -2805,6 +2805,7 @@ LocatedBlock getAdditionalBlock( String src, long fileId, String clientName, LocatedBlock[] onRetryBlock = new LocatedBlock[1]; final INodeFile pendingFile = analyzeFileState(src, fileId, clientName, previous, onRetryBlock); + src = pendingFile.getFullPathName(); if (onRetryBlock[0] != null ) { if (onRetryBlock[0].getLocations().length > 0) {
        Hide
        Colin Patrick McCabe added a comment -

        The fix here has a problem. If the file is renamed while an addBlocks is going on, it will fail with throw new IOException("Cannot locate the original file. Original: "+ src + " , current: " + fileState.path);.

        Show
        Colin Patrick McCabe added a comment - The fix here has a problem. If the file is renamed while an addBlocks is going on, it will fail with throw new IOException("Cannot locate the original file. Original: "+ src + " , current: " + fileState.path); .
        Hide
        Kihwal Lee added a comment -

        Yes, I meant HDFS-6618. Corrected the description.

        Show
        Kihwal Lee added a comment - Yes, I meant HDFS-6618 . Corrected the description.
        Kihwal Lee made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Kihwal Lee made changes -
        Attachment HDFS-6622.patch [ 12653705 ]
        Hide
        Kihwal Lee added a comment -

        Fix + test case. The test fails without the fix.

        Show
        Kihwal Lee added a comment - Fix + test case. The test fails without the fix.
        Kihwal Lee made changes -
        Field Original Value New Value
        Description While investigating HDFS-6681, we have discovered that rename happening in the middle of {{getAdditionalBlock()}} can lead to logging of invalid edit entry.

        In {{getAdditionalBlock()}} , the path is resolved once while holding the read lock and the same resolved path will be used in the edit log in the second half of the method holding the write lock. If a rename happens in between two locks, the path may no longer exist.

        When replaying the {{AddBlockOp}}, it will fail with FileNotFound.
        While investigating HDFS-6618, we have discovered that rename happening in the middle of {{getAdditionalBlock()}} can lead to logging of invalid edit entry.

        In {{getAdditionalBlock()}} , the path is resolved once while holding the read lock and the same resolved path will be used in the edit log in the second half of the method holding the write lock. If a rename happens in between two locks, the path may no longer exist.

        When replaying the {{AddBlockOp}}, it will fail with FileNotFound.
        Hide
        Aaron T. Myers added a comment -

        Hey Kihwal, I think you might mean HDFS-6618 in the description?

        Show
        Aaron T. Myers added a comment - Hey Kihwal, I think you might mean HDFS-6618 in the description?
        Kihwal Lee created issue -

          People

          • Assignee:
            Kihwal Lee
            Reporter:
            Kihwal Lee
          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development