Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3864

NN does not update internal file mtime for OP_CLOSE when reading from the edit log

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 2.0.2-alpha
    • Component/s: namenode
    • Labels:
      None

      Description

      When logging an OP_CLOSE to the edit log, the NN writes out an updated file mtime and atime. However, when reading in an OP_CLOSE from the edit log, the NN does not apply these values to the in-memory FS data structure. Because of this, a file's mtime or atime may appear to go back in time after an NN restart, or an HA failover.

      Most of the time this will be harmless and folks won't notice, but in the event one of these files is being used in the distributed cache of an MR job when an HA failover occurs, the job might notice that the mtime of a cache file has changed, which in MR2 will cause the job to fail with an exception like the following:

      java.io.IOException: Resource hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar changed on src filesystem (expected 1342137814599, was 1342137814473
      	at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
      	at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
      	at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
      	at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
      	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
      	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:662)
      

      Credit to Sujay Rau for discovering this issue.

      1. HDFS-3864.patch
        4 kB
        Aaron T. Myers
      2. HDFS-3864.patch
        4 kB
        Aaron T. Myers

        Activity

        Arun C Murthy made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Aaron T. Myers made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 2.2.0-alpha [ 12322472 ]
        Resolution Fixed [ 1 ]
        Aaron T. Myers made changes -
        Attachment HDFS-3864.patch [ 12542846 ]
        Aaron T. Myers made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Aaron T. Myers made changes -
        Field Original Value New Value
        Attachment HDFS-3864.patch [ 12542840 ]
        Aaron T. Myers created issue -

          People

          • Assignee:
            Aaron T. Myers
            Reporter:
            Aaron T. Myers
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development