Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3864

NN does not update internal file mtime for OP_CLOSE when reading from the edit log

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 2.0.2-alpha
    • Component/s: namenode
    • Labels:
      None

      Description

      When logging an OP_CLOSE to the edit log, the NN writes out an updated file mtime and atime. However, when reading in an OP_CLOSE from the edit log, the NN does not apply these values to the in-memory FS data structure. Because of this, a file's mtime or atime may appear to go back in time after an NN restart, or an HA failover.

      Most of the time this will be harmless and folks won't notice, but in the event one of these files is being used in the distributed cache of an MR job when an HA failover occurs, the job might notice that the mtime of a cache file has changed, which in MR2 will cause the job to fail with an exception like the following:

      java.io.IOException: Resource hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar changed on src filesystem (expected 1342137814599, was 1342137814473
      	at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
      	at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
      	at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
      	at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
      	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
      	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:662)
      

      Credit to Sujay Rau for discovering this issue.

      1. HDFS-3864.patch
        4 kB
        Aaron T. Myers
      2. HDFS-3864.patch
        4 kB
        Aaron T. Myers

        Activity

        Hide
        Aaron T. Myers added a comment - - edited

        Here's a patch which addresses the issue. Fortunately, the fix is quite simple - just apply the values that we read in from the edit log.

        In addition to the automated test provided in the patch, I also tested this manually on an HA cluster and confirmed that MR jobs no longer experience the "distributed cache object changed" errors which caused this issue to be discovered.

        Show
        Aaron T. Myers added a comment - - edited Here's a patch which addresses the issue. Fortunately, the fix is quite simple - just apply the values that we read in from the edit log. In addition to the automated test provided in the patch, I also tested this manually on an HA cluster and confirmed that MR jobs no longer experience the "distributed cache object changed" errors which caused this issue to be discovered.
        Hide
        Todd Lipcon added a comment -

        +1, looks good. One thing: do you really need a 5 second sleep here, or could you do with some small number of milliseconds? I'd think a 10ms sleep should be sufficient to always fail without the bug fix, so I don't see any reason to have a long-running test.

        Show
        Todd Lipcon added a comment - +1, looks good. One thing: do you really need a 5 second sleep here, or could you do with some small number of milliseconds? I'd think a 10ms sleep should be sufficient to always fail without the bug fix, so I don't see any reason to have a long-running test.
        Hide
        Aaron T. Myers added a comment -

        Thanks a lot for the quick review, Todd.

        Here's an updated patch which lowers the sleep time to 10 milliseconds.

        Show
        Aaron T. Myers added a comment - Thanks a lot for the quick review, Todd. Here's an updated patch which lowers the sleep time to 10 milliseconds.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12542840/HDFS-3864.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.TestHftpDelegationToken

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3113//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3113//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3113//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542840/HDFS-3864.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3113//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3113//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3113//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12542846/HDFS-3864.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.TestHftpDelegationToken
        org.apache.hadoop.hdfs.web.TestWebHDFS
        org.apache.hadoop.hdfs.server.datanode.TestBPOfferService

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3114//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3114//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3114//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542846/HDFS-3864.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken org.apache.hadoop.hdfs.web.TestWebHDFS org.apache.hadoop.hdfs.server.datanode.TestBPOfferService +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3114//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3114//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3114//console This message is automatically generated.
        Hide
        Aaron T. Myers added a comment -

        The findbugs warning is unrelated and I'm confident that the test failures are unrelated as well.

        I'm going to commit this patch momentarily.

        Show
        Aaron T. Myers added a comment - The findbugs warning is unrelated and I'm confident that the test failures are unrelated as well. I'm going to commit this patch momentarily.
        Hide
        Aaron T. Myers added a comment -

        I've just committed this to trunk and branch-2. Thanks a lot for the review, Todd.

        Show
        Aaron T. Myers added a comment - I've just committed this to trunk and branch-2. Thanks a lot for the review, Todd.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2717 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2717/)
        HDFS-3864. NN does not update internal file mtime for OP_CLOSE when reading from the edit log. Contributed by Aaron T. Myers. (Revision 1378413)

        Result = SUCCESS
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2717 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2717/ ) HDFS-3864 . NN does not update internal file mtime for OP_CLOSE when reading from the edit log. Contributed by Aaron T. Myers. (Revision 1378413) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2654 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2654/)
        HDFS-3864. NN does not update internal file mtime for OP_CLOSE when reading from the edit log. Contributed by Aaron T. Myers. (Revision 1378413)

        Result = SUCCESS
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2654 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2654/ ) HDFS-3864 . NN does not update internal file mtime for OP_CLOSE when reading from the edit log. Contributed by Aaron T. Myers. (Revision 1378413) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #2683 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2683/)
        HDFS-3864. NN does not update internal file mtime for OP_CLOSE when reading from the edit log. Contributed by Aaron T. Myers. (Revision 1378413)

        Result = FAILURE
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2683 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2683/ ) HDFS-3864 . NN does not update internal file mtime for OP_CLOSE when reading from the edit log. Contributed by Aaron T. Myers. (Revision 1378413) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1149 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1149/)
        HDFS-3864. NN does not update internal file mtime for OP_CLOSE when reading from the edit log. Contributed by Aaron T. Myers. (Revision 1378413)

        Result = FAILURE
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1149 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1149/ ) HDFS-3864 . NN does not update internal file mtime for OP_CLOSE when reading from the edit log. Contributed by Aaron T. Myers. (Revision 1378413) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1180 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1180/)
        HDFS-3864. NN does not update internal file mtime for OP_CLOSE when reading from the edit log. Contributed by Aaron T. Myers. (Revision 1378413)

        Result = SUCCESS
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1180 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1180/ ) HDFS-3864 . NN does not update internal file mtime for OP_CLOSE when reading from the edit log. Contributed by Aaron T. Myers. (Revision 1378413) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java

          People

          • Assignee:
            Aaron T. Myers
            Reporter:
            Aaron T. Myers
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development