Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12603

TestSymlinkLocalFSFileContext#testSetTimesSymlinkToDir occasionally fail

    Details

    • Hadoop Flags:
      Reviewed

      Description

      I have observed this test failure a few times in the past. When it fails, the expected access time (of the file link) is always 1000 less than the actual access time.

      Error Message

      expected:<1448478654000> but was:<1448478655000>
      

      Stacktrace

      java.lang.AssertionError: expected:<1448478654000> but was:<1448478655000>
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.failNotEquals(Assert.java:743)
      	at org.junit.Assert.assertEquals(Assert.java:118)
      	at org.junit.Assert.assertEquals(Assert.java:555)
      	at org.junit.Assert.assertEquals(Assert.java:542)
      	at org.apache.hadoop.fs.SymlinkBaseTest.testSetTimesSymlinkToDir(SymlinkBaseTest.java:1391)
      	at org.apache.hadoop.fs.TestSymlinkLocalFS.testSetTimesSymlinkToDir(TestSymlinkLocalFS.java:233)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
      

      Standard Output

      2015-11-25 19:10:55,231 WARN  fs.FileUtil (FileUtil.java:symLink(813)) - Command 'ln -s /testptch/hadoop/hadoop-common-project/hadoop-common/target/test/data/4/vae1ng5t75/test1/file /testptch/hadoop/hadoop-common-project/hadoop-common/target/test/data/4/vae1ng5t75/test2/linkToFile' failed 1 with: ln: failed to create symbolic link '/testptch/hadoop/hadoop-common-project/hadoop-common/target/test/data/4/vae1ng5t75/test2/linkToFile': No such file or directory
      
      2015-11-25 19:10:56,212 WARN  fs.FileUtil (FileUtil.java:symLink(813)) - Command 'ln -s /testptch/hadoop/hadoop-common-project/hadoop-common/target/test/data/4/vae1ng5t75/test1/file /testptch/hadoop/hadoop-common-project/hadoop-common/target/test/data/4/vae1ng5t75/test1/linkToFile' failed 1 with: ln: failed to create symbolic link '/testptch/hadoop/hadoop-common-project/hadoop-common/target/test/data/4/vae1ng5t75/test1/linkToFile': File exists
      
      
      1. HADOOP-12603.001.patch
        2 kB
        Wei-Chiu Chuang
      2. HADOOP-12603.002.patch
        2 kB
        Wei-Chiu Chuang

        Issue Links

          Activity

          Hide
          alanburlison Alan Burlison added a comment -

          It's even worse on Solaris:

          testSetTimesSymlinkToDir(org.apache.hadoop.fs.TestSymlinkLocalFSFileContext)  Time elapsed: 0.073 sec  <<< FAILURE!
          java.lang.AssertionError: expected:<1447788288000> but was:<3000>
                  at org.junit.Assert.fail(Assert.java:88)
                  at org.junit.Assert.failNotEquals(Assert.java:743)
                  at org.junit.Assert.assertEquals(Assert.java:118)
                  at org.junit.Assert.assertEquals(Assert.java:555)
                  at org.junit.Assert.assertEquals(Assert.java:542)
                  at org.apache.hadoop.fs.SymlinkBaseTest.testSetTimesSymlinkToDir(SymlinkBaseTest.java:1391)
                  at org.apache.hadoop.fs.TestSymlinkLocalFS.testSetTimesSymlinkToDir(TestSymlinkLocalFS.java:233)
                  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                  at java.lang.reflect.Method.invoke(Method.java:606)
                  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
                  at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
                  at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
                  at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
                  at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
          

          I haven't managed to make any headway in identifying the cause yet.

          Show
          alanburlison Alan Burlison added a comment - It's even worse on Solaris: testSetTimesSymlinkToDir(org.apache.hadoop.fs.TestSymlinkLocalFSFileContext) Time elapsed: 0.073 sec <<< FAILURE! java.lang.AssertionError: expected:<1447788288000> but was:<3000> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.fs.SymlinkBaseTest.testSetTimesSymlinkToDir(SymlinkBaseTest.java:1391) at org.apache.hadoop.fs.TestSymlinkLocalFS.testSetTimesSymlinkToDir(TestSymlinkLocalFS.java:233) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) I haven't managed to make any headway in identifying the cause yet.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          The warnings in the stdout is unrelated.

          Show
          jojochuang Wei-Chiu Chuang added a comment - The warnings in the stdout is unrelated.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          If I insert a two second sleep in testSetTimesSymlinkToDir() in SymlinkBaseTest.java:

              long at = wrapper.getFileLinkStatus(link).getAccessTime();
              // the local file system may not support millisecond timestamps
          
              try {
              Thread.sleep(2000);
              } catch (Exception e){}
              wrapper.setTimes(link, 2000L, 3000L);
          
              assertEquals(at, wrapper.getFileLinkStatus(link).getAccessTime());
          

          Then this test code will always fail, usually the actual timestamp is two seconds after the expected timestamp. This implies that (in Linux) setting timestamp of the destination file also updates access time of the symbolic link, and therefore breaks the assumption of the test case.

          Show
          jojochuang Wei-Chiu Chuang added a comment - If I insert a two second sleep in testSetTimesSymlinkToDir() in SymlinkBaseTest.java: long at = wrapper.getFileLinkStatus(link).getAccessTime(); // the local file system may not support millisecond timestamps try { Thread .sleep(2000); } catch (Exception e){} wrapper.setTimes(link, 2000L, 3000L); assertEquals(at, wrapper.getFileLinkStatus(link).getAccessTime()); Then this test code will always fail, usually the actual timestamp is two seconds after the expected timestamp. This implies that (in Linux) setting timestamp of the destination file also updates access time of the symbolic link, and therefore breaks the assumption of the test case.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Java API does not seem to specify if the access time of the symlink is updated in this case. I think it could be platform dependent, or maybe version dependent.

          https://docs.oracle.com/javase/7/docs/api/java/nio/file/attribute/BasicFileAttributeView.html#setTimes%28java.nio.file.attribute.FileTime,%20java.nio.file.attribute.FileTime,%20java.nio.file.attribute.FileTime%29

          POSIX has clear definition in this case: http://pubs.opengroup.org/onlinepubs/9699919799/functions/futimens.html But it's not clear if Java implementation uses it.

          In this case, I suggest removing the checking of atime, due to the ambiguity. I will keep looking though.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Java API does not seem to specify if the access time of the symlink is updated in this case. I think it could be platform dependent, or maybe version dependent. https://docs.oracle.com/javase/7/docs/api/java/nio/file/attribute/BasicFileAttributeView.html#setTimes%28java.nio.file.attribute.FileTime,%20java.nio.file.attribute.FileTime,%20java.nio.file.attribute.FileTime%29 POSIX has clear definition in this case: http://pubs.opengroup.org/onlinepubs/9699919799/functions/futimens.html But it's not clear if Java implementation uses it. In this case, I suggest removing the checking of atime, due to the ambiguity. I will keep looking though.
          Hide
          jzhuge John Zhuge added a comment -

          Did the assertion always occur when verifying atime (not mtime)?
          Is dir "/testptch/hadoop/hadoop-common-project/hadoop-common/target/test/data/4/vae1ng5t75/test1/" on NFS mount?
          If so, run "grep testptch /proc/mounts" on Linux; also find out the NFS server type.
          NFS mount has "noatime" option to improve performance.

          Show
          jzhuge John Zhuge added a comment - Did the assertion always occur when verifying atime (not mtime)? Is dir "/testptch/hadoop/hadoop-common-project/hadoop-common/target/test/data/4/vae1ng5t75/test1/" on NFS mount? If so, run "grep testptch /proc/mounts" on Linux; also find out the NFS server type. NFS mount has "noatime" option to improve performance.
          Hide
          stevel@apache.org Steve Loughran added a comment -
          1. on windows the FS timestamp is ~2s. So we should have a range of a slightly more than this. Chris Nauroth —this is still the case, right.
          2. even with a 1s granularity, there's always the risk of a race where the clock rolls; a 1/1000 event in my estimate
          3. if things are mounted then the time can be generated at the far end and a few seconds out. Or, if your clock is in a different TZ from the server, many hours out, which can cause no end of fun with automatic GC of old files in production systems, I can assure you.

          If you run ant -diagnostics you can see a check of temp dir timestamp alignment, or more precisely, the timestamp diff between the time a file was created and the time immediately afterwards.

          Show
          stevel@apache.org Steve Loughran added a comment - on windows the FS timestamp is ~2s. So we should have a range of a slightly more than this. Chris Nauroth —this is still the case, right. even with a 1s granularity, there's always the risk of a race where the clock rolls; a 1/1000 event in my estimate if things are mounted then the time can be generated at the far end and a few seconds out. Or, if your clock is in a different TZ from the server, many hours out, which can cause no end of fun with automatic GC of old files in production systems, I can assure you. If you run ant -diagnostics you can see a check of temp dir timestamp alignment, or more precisely, the timestamp diff between the time a file was created and the time immediately afterwards.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          ..given this is checking local FS behaviour, and that's an implementation detail we can't address, why not just kill these tests. They don't show regressions in our code, merely demonstrate that different filesystems behave differently.

          if they are to be retained, then they should go into the AbstractFSContract tests, and LocalFSContract can determine the expected timestamp range on symlinks, the way it already determines FS permission model and case sensitivity.

          Show
          stevel@apache.org Steve Loughran added a comment - ..given this is checking local FS behaviour, and that's an implementation detail we can't address, why not just kill these tests. They don't show regressions in our code, merely demonstrate that different filesystems behave differently. if they are to be retained, then they should go into the AbstractFSContract tests, and LocalFSContract can determine the expected timestamp range on symlinks, the way it already determines FS permission model and case sensitivity.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Thanks for the comments. This test fails much more frequent than 1 in 1000 (around 1 in 4~10 on my local machine). Possibly setTimes() operation takes more than 1 millisecond. But you are right, the time difference is non zero and could be longer in different scenarios.

          I am more inclined to removing the assertion. But I would like to hear comments from Chris before taking actions.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Thanks for the comments. This test fails much more frequent than 1 in 1000 (around 1 in 4~10 on my local machine). Possibly setTimes() operation takes more than 1 millisecond. But you are right, the time difference is non zero and could be longer in different scenarios. I am more inclined to removing the assertion. But I would like to hear comments from Chris before taking actions.
          Hide
          alanburlison Alan Burlison added a comment -

          At the OS level, neither Linux nor Solaris modify the times on a symlink when updating, only the times of the target file are updated. The java.nio.file.attribute.BasicFileAttributeView does the same, setting the time via a symlink sets the tike of the file not the symlink.

          Show
          alanburlison Alan Burlison added a comment - At the OS level, neither Linux nor Solaris modify the times on a symlink when updating, only the times of the target file are updated. The java.nio.file.attribute.BasicFileAttributeView does the same, setting the time via a symlink sets the tike of the file not the symlink.
          Hide
          alanburlison Alan Burlison added a comment -

          +1 for removing the tests.

          Show
          alanburlison Alan Burlison added a comment - +1 for removing the tests.
          Hide
          cnauroth Chris Nauroth added a comment -

          These tests were introduced in HADOOP-12045. I'm linking the issues.

          The failure on Solaris is the same as the way it fails on Windows (if you remove the assumeTrue that currently skips it on Windows). The root cause in this case is lack of a native stat implementation on the platform's Hadoop native library. This then causes the FileStatus to be populated with the atime from the target instead of the symlink. For more details, refer to this comment of mine from HADOOP-12045, where decided to skip these tests on Windows:

          https://issues.apache.org/jira/browse/HADOOP-12045?focusedCommentId=14606655&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14606655

          The tests are intended to check the capability introduced in HADOOP-12045: changing atime on the local file system. Removing the tests would leave us in a state where that enhancement was no longer tested regularly. However, it seems as though the mere act of checking status of the symlink risks updating the atime, depending on timing.

          Instead of removing the tests, I suggest that we relax the assertion so that instead of strict equality, it checks that current atime is greater than or equal. This would still serve the intent of the test, which is to check that setTimes alters the target instead of the link. The previous line sets atime to 3000, which is way in the past, so we never expect to be anytime near that in the link's atime.

          Show
          cnauroth Chris Nauroth added a comment - These tests were introduced in HADOOP-12045 . I'm linking the issues. The failure on Solaris is the same as the way it fails on Windows (if you remove the assumeTrue that currently skips it on Windows). The root cause in this case is lack of a native stat implementation on the platform's Hadoop native library. This then causes the FileStatus to be populated with the atime from the target instead of the symlink. For more details, refer to this comment of mine from HADOOP-12045 , where decided to skip these tests on Windows: https://issues.apache.org/jira/browse/HADOOP-12045?focusedCommentId=14606655&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14606655 The tests are intended to check the capability introduced in HADOOP-12045 : changing atime on the local file system. Removing the tests would leave us in a state where that enhancement was no longer tested regularly. However, it seems as though the mere act of checking status of the symlink risks updating the atime, depending on timing. Instead of removing the tests, I suggest that we relax the assertion so that instead of strict equality, it checks that current atime is greater than or equal. This would still serve the intent of the test, which is to check that setTimes alters the target instead of the link. The previous line sets atime to 3000, which is way in the past, so we never expect to be anytime near that in the link's atime.
          Hide
          alanburlison Alan Burlison added a comment -

          The root cause in this case is lack of a native stat implementation on the platform's Hadoop native library. This then causes the FileStatus to be populated with the atime from the target instead of the symlink.

          NIO has the ability to select either the symlink or the target of the symlink in many of its APIs, e,g, http://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#getFileAttributeView-java.nio.file.Path-java.lang.Class-java.nio.file.LinkOption...- That might be an option rather than Native code.

          Show
          alanburlison Alan Burlison added a comment - The root cause in this case is lack of a native stat implementation on the platform's Hadoop native library. This then causes the FileStatus to be populated with the atime from the target instead of the symlink. NIO has the ability to select either the symlink or the target of the symlink in many of its APIs, e,g, http://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#getFileAttributeView-java.nio.file.Path-java.lang.Class-java.nio.file.LinkOption...- That might be an option rather than Native code.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Thanks Chris Nauroth.
          I am attaching rev01, basically relaxing the assertion condition, and ignore the test if the platform is Windows or Solaris.

          The other solution is to use Java's get/setFileAttributeView class to update the symlink's atime (with the NOFOLLOW_LINKS option). But considering side effects (e.g. symlink to symlink) it can still be error-prone for corner cases.

          I wonder is there any general guidelines (POSIX?) that specify what happens to the symlink when the the mtime/atime of target is updated. Or, do we care about it at all? Reading HADOOP-12045, I feel the semantics of atime of symlink should be backward-compatible, and it has always been the case that atime of symlink is always updated when setTimes is called.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Thanks Chris Nauroth . I am attaching rev01, basically relaxing the assertion condition, and ignore the test if the platform is Windows or Solaris. The other solution is to use Java's get/setFileAttributeView class to update the symlink's atime (with the NOFOLLOW_LINKS option). But considering side effects (e.g. symlink to symlink) it can still be error-prone for corner cases. I wonder is there any general guidelines (POSIX?) that specify what happens to the symlink when the the mtime/atime of target is updated. Or, do we care about it at all? Reading HADOOP-12045 , I feel the semantics of atime of symlink should be backward-compatible, and it has always been the case that atime of symlink is always updated when setTimes is called.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 8m 8s trunk passed
          +1 compile 8m 34s trunk passed with JDK v1.8.0_66
          +1 compile 9m 13s trunk passed with JDK v1.7.0_85
          +1 checkstyle 0m 17s trunk passed
          +1 mvnsite 1m 5s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 52s trunk passed
          +1 javadoc 0m 55s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 8s trunk passed with JDK v1.7.0_85
          +1 mvninstall 1m 40s the patch passed
          +1 compile 8m 33s the patch passed with JDK v1.8.0_66
          +1 javac 8m 33s the patch passed
          +1 compile 9m 13s the patch passed with JDK v1.7.0_85
          +1 javac 9m 13s the patch passed
          +1 checkstyle 0m 16s the patch passed
          +1 mvnsite 1m 6s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 3s the patch passed
          +1 javadoc 0m 58s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 6s the patch passed with JDK v1.7.0_85
          -1 unit 7m 46s hadoop-common in the patch failed with JDK v1.8.0_66.
          -1 unit 7m 44s hadoop-common in the patch failed with JDK v1.7.0_85.
          +1 asflicense 0m 24s Patch does not generate ASF License warnings.
          73m 38s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.security.ssl.TestReloadingX509TrustManager
          JDK v1.7.0_85 Failed junit tests hadoop.metrics2.impl.TestGangliaMetrics



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12774746/HADOOP-12603.001.patch
          JIRA Issue HADOOP-12603
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 0667e3b85a2d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / c37c3f4
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8165/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8165/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_85.txt
          unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8165/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/8165/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_85.txt
          JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8165/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Max memory used 75MB
          Powered by Apache Yetus http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8165/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 8m 8s trunk passed +1 compile 8m 34s trunk passed with JDK v1.8.0_66 +1 compile 9m 13s trunk passed with JDK v1.7.0_85 +1 checkstyle 0m 17s trunk passed +1 mvnsite 1m 5s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 52s trunk passed +1 javadoc 0m 55s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 8s trunk passed with JDK v1.7.0_85 +1 mvninstall 1m 40s the patch passed +1 compile 8m 33s the patch passed with JDK v1.8.0_66 +1 javac 8m 33s the patch passed +1 compile 9m 13s the patch passed with JDK v1.7.0_85 +1 javac 9m 13s the patch passed +1 checkstyle 0m 16s the patch passed +1 mvnsite 1m 6s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 3s the patch passed +1 javadoc 0m 58s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 6s the patch passed with JDK v1.7.0_85 -1 unit 7m 46s hadoop-common in the patch failed with JDK v1.8.0_66. -1 unit 7m 44s hadoop-common in the patch failed with JDK v1.7.0_85. +1 asflicense 0m 24s Patch does not generate ASF License warnings. 73m 38s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.security.ssl.TestReloadingX509TrustManager JDK v1.7.0_85 Failed junit tests hadoop.metrics2.impl.TestGangliaMetrics Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12774746/HADOOP-12603.001.patch JIRA Issue HADOOP-12603 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 0667e3b85a2d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c37c3f4 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8165/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8165/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_85.txt unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8165/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/8165/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_85.txt JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8165/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Max memory used 75MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8165/console This message was automatically generated.
          Hide
          alanburlison Alan Burlison added a comment -

          I've written a C test case and it works the same on Solaris and Linux - updating the mtime/atime affects the target file unless AT_SYMLINK_NOFOLLOW is specified. As far as I can tell, the NIO classes behave the same way. I'm still not clear why the test case is failing, is it possible to provide a self-contained Java program that illustrates the problem, because I'm having difficulty replicating the problem outside of the test harness.

          Show
          alanburlison Alan Burlison added a comment - I've written a C test case and it works the same on Solaris and Linux - updating the mtime/atime affects the target file unless AT_SYMLINK_NOFOLLOW is specified. As far as I can tell, the NIO classes behave the same way. I'm still not clear why the test case is failing, is it possible to provide a self-contained Java program that illustrates the problem, because I'm having difficulty replicating the problem outside of the test harness.
          Hide
          cnauroth Chris Nauroth added a comment -

          HADOOP-11935 is the issue that tracks filling in the gap on the stat syscall. As you pointed out, we're also considering NIO instead of JNI.

          Show
          cnauroth Chris Nauroth added a comment - HADOOP-11935 is the issue that tracks filling in the gap on the stat syscall. As you pointed out, we're also considering NIO instead of JNI.
          Hide
          cnauroth Chris Nauroth added a comment -

          The failure reported here is on a check of atime for a directory (not a file). Did your test try a directory or just a file? The theory proposed in my last comment is that the act of accessing the directory through the symlink to check its current atime for the assertion risks updating the atime once again. If the granularity of the atime tracking is 1 second, then that would explain why the failure is so rare and why it manifests as showing an atime off by a full second.

          Show
          cnauroth Chris Nauroth added a comment - The failure reported here is on a check of atime for a directory (not a file). Did your test try a directory or just a file? The theory proposed in my last comment is that the act of accessing the directory through the symlink to check its current atime for the assertion risks updating the atime once again. If the granularity of the atime tracking is 1 second, then that would explain why the failure is so rare and why it manifests as showing an atime off by a full second.
          Hide
          cnauroth Chris Nauroth added a comment -
              assumeTrue(!Shell.WINDOWS || !Shell.SOLARIS);
          

          I think this was supposed to be AND instead of OR. Right now, it always evaluates true, because it can't be two different OSes at the same time. Also, there are 3 total test cases inside TestSymlinkLocalFS that are skipped on Windows. Please update all 3 to skip on Solaris too.

          Thank you.

          Show
          cnauroth Chris Nauroth added a comment - assumeTrue(!Shell.WINDOWS || !Shell.SOLARIS); I think this was supposed to be AND instead of OR. Right now, it always evaluates true , because it can't be two different OSes at the same time. Also, there are 3 total test cases inside TestSymlinkLocalFS that are skipped on Windows. Please update all 3 to skip on Solaris too. Thank you.
          Hide
          alanburlison Alan Burlison added a comment -

          Ahah, that's the bit of the jigsaw I was missing - yes I only checked files. However I think your supposition is correct - accessing a dir is accessing a dir, even if it's done through a symlink so atime will be updated, and POSIX says this about the timestamp resolution of the utimes() syscall:

          accuracy is only to the microsecond, not nanosecond, and rounding towards the nearest second may occur.
          

          I think it's therefore safest to assume rounding will happen. If you use the futimens() or utimensat() syscall then you should get nanosecond-accurate times but to be maximally portable I think you need to assume the JVM will use utimes() and will therefore only be microsecond-accurate at best, and may round up to the nearest second.

          I think rather than special-casing the test for specific platforms it should just be modified to check for the POSIX semantics, as anything else is non-portable.

          Show
          alanburlison Alan Burlison added a comment - Ahah, that's the bit of the jigsaw I was missing - yes I only checked files. However I think your supposition is correct - accessing a dir is accessing a dir, even if it's done through a symlink so atime will be updated, and POSIX says this about the timestamp resolution of the utimes() syscall: accuracy is only to the microsecond, not nanosecond, and rounding towards the nearest second may occur. I think it's therefore safest to assume rounding will happen. If you use the futimens() or utimensat() syscall then you should get nanosecond-accurate times but to be maximally portable I think you need to assume the JVM will use utimes() and will therefore only be microsecond-accurate at best, and may round up to the nearest second. I think rather than special-casing the test for specific platforms it should just be modified to check for the POSIX semantics, as anything else is non-portable.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Rev02: skip one other test cases for Solaris as well.

          From HADOOP-12583, TestSymlinkLocalFS.testDanglingLink, TestSymlinkLocalFS.testSetTimesSymlinkToDir and TestSymlinkLocalFS.testSetTimesSymlinkToFile also fail on Solaris. Two of which are related to symlink atime.

          TestSymlinkLocalFS.testDanglingLink is not related to atime so this is maybe due to some other issues with Solaris.

          It's interesting testSetTimesDanglingLink did not fail on Solaris.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Rev02: skip one other test cases for Solaris as well. From HADOOP-12583 , TestSymlinkLocalFS.testDanglingLink, TestSymlinkLocalFS.testSetTimesSymlinkToDir and TestSymlinkLocalFS.testSetTimesSymlinkToFile also fail on Solaris. Two of which are related to symlink atime. TestSymlinkLocalFS.testDanglingLink is not related to atime so this is maybe due to some other issues with Solaris. It's interesting testSetTimesDanglingLink did not fail on Solaris.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 9m 54s trunk passed
          +1 compile 13m 10s trunk passed with JDK v1.8.0_66
          +1 compile 11m 30s trunk passed with JDK v1.7.0_85
          +1 checkstyle 0m 20s trunk passed
          +1 mvnsite 1m 19s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 2m 25s trunk passed
          +1 javadoc 1m 19s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 21s trunk passed with JDK v1.7.0_85
          +1 mvninstall 1m 56s the patch passed
          +1 compile 12m 41s the patch passed with JDK v1.8.0_66
          +1 javac 12m 41s the patch passed
          +1 compile 11m 26s the patch passed with JDK v1.7.0_85
          +1 javac 11m 26s the patch passed
          +1 checkstyle 0m 19s the patch passed
          +1 mvnsite 1m 16s the patch passed
          +1 mvneclipse 0m 17s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 33s the patch passed
          +1 javadoc 1m 16s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 23s the patch passed with JDK v1.7.0_85
          -1 unit 9m 25s hadoop-common in the patch failed with JDK v1.8.0_66.
          -1 unit 8m 54s hadoop-common in the patch failed with JDK v1.7.0_85.
          +1 asflicense 0m 29s Patch does not generate ASF License warnings.
          94m 56s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.fs.shell.find.TestFind
            hadoop.ipc.TestIPC
          JDK v1.7.0_85 Failed junit tests hadoop.ipc.TestDecayRpcScheduler



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12774881/HADOOP-12603.002.patch
          JIRA Issue HADOOP-12603
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 824a28801171 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 43acf9a
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8167/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8167/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_85.txt
          unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8167/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/8167/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_85.txt
          JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8167/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Max memory used 76MB
          Powered by Apache Yetus http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8167/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 9m 54s trunk passed +1 compile 13m 10s trunk passed with JDK v1.8.0_66 +1 compile 11m 30s trunk passed with JDK v1.7.0_85 +1 checkstyle 0m 20s trunk passed +1 mvnsite 1m 19s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 2m 25s trunk passed +1 javadoc 1m 19s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 21s trunk passed with JDK v1.7.0_85 +1 mvninstall 1m 56s the patch passed +1 compile 12m 41s the patch passed with JDK v1.8.0_66 +1 javac 12m 41s the patch passed +1 compile 11m 26s the patch passed with JDK v1.7.0_85 +1 javac 11m 26s the patch passed +1 checkstyle 0m 19s the patch passed +1 mvnsite 1m 16s the patch passed +1 mvneclipse 0m 17s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 33s the patch passed +1 javadoc 1m 16s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 23s the patch passed with JDK v1.7.0_85 -1 unit 9m 25s hadoop-common in the patch failed with JDK v1.8.0_66. -1 unit 8m 54s hadoop-common in the patch failed with JDK v1.7.0_85. +1 asflicense 0m 29s Patch does not generate ASF License warnings. 94m 56s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.fs.shell.find.TestFind   hadoop.ipc.TestIPC JDK v1.7.0_85 Failed junit tests hadoop.ipc.TestDecayRpcScheduler Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12774881/HADOOP-12603.002.patch JIRA Issue HADOOP-12603 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 824a28801171 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 43acf9a findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8167/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8167/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_85.txt unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8167/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/8167/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_85.txt JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8167/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Max memory used 76MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8167/console This message was automatically generated.
          Hide
          alanburlison Alan Burlison added a comment -

          I'm concerned that we are just disabling tests rather than understanding what the correct semantics should be. There are also ownership test failures on Solaris related to symlinks so there are clearly more general issues at play here, rather than just time-related ones.

          What are the assumptions that are being made about Linux behaviour by these tests?

          Show
          alanburlison Alan Burlison added a comment - I'm concerned that we are just disabling tests rather than understanding what the correct semantics should be. There are also ownership test failures on Solaris related to symlinks so there are clearly more general issues at play here, rather than just time-related ones. What are the assumptions that are being made about Linux behaviour by these tests?
          Hide
          alanburlison Alan Burlison added a comment -

          If I look on a Centos machine I see that / is mounted with the following options:

          /dev/mapper/centos-root on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
          

          The mount(8) manpage says this about relatime:

                 relatime
                        Update inode access times relative to  modify  or  change  time.
                        Access time is only updated if the previous access time was earlier
                        than the current modify or change time. (Similar  to  noatime,
                        but  doesn't break mutt or other applications that need to
                        know if a file has been read since the last time  it  was  modified.)
          
                        Since Linux 2.6.30, the kernel defaults to the behavior provided
                        by this option (unless noatime was  specified), and the strictatime
                        option  is  required  to  obtain traditional semantics. In
                        addition, since Linux 2.6.30, the file's  last  access  time  is
                        always  updated  if  it  is more than 1 day old.
          

          That's not POSIX-conformant behaviour, and as it says is the default from 2.6.30 onwards. Solaris has a noatime mount option which is the same as Linux's noatime, but no direct equivalent to relatime, although on UFS it has dfratime which is similar. ZFS, which is now the default Solaris filesystem, only has noatime. BSD only appears to have noatime as well. Linux also has nodiratime which isn't available on either Solaris or BSD. Windows varies depending on filesystem type but NTFS may defer updates to atime for up to an hour and you can also disable atime updates entirely. On FAT the resolution of atime is 1 day.

          Leaving aside the issue of POSIX correctness, unless you implement a probe that understands the semantics of atime updates across all relevant systems and that probes for the mount options of the filesystem you are testing on, any access time behaviour tests are almost certainly going to be unreliable.

          Rather than trying to bodge up these tests, I believe it's better to remove them entirely.

          More here: https://en.wikibooks.org/wiki/C_Programming/POSIX_Reference/sys/stat.h/stat#Criticism_of_atime

          Show
          alanburlison Alan Burlison added a comment - If I look on a Centos machine I see that / is mounted with the following options: /dev/mapper/centos-root on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota) The mount(8) manpage says this about relatime: relatime Update inode access times relative to modify or change time. Access time is only updated if the previous access time was earlier than the current modify or change time. (Similar to noatime, but doesn't break mutt or other applications that need to know if a file has been read since the last time it was modified.) Since Linux 2.6.30, the kernel defaults to the behavior provided by this option (unless noatime was specified), and the strictatime option is required to obtain traditional semantics. In addition, since Linux 2.6.30, the file's last access time is always updated if it is more than 1 day old. That's not POSIX-conformant behaviour, and as it says is the default from 2.6.30 onwards. Solaris has a noatime mount option which is the same as Linux's noatime, but no direct equivalent to relatime, although on UFS it has dfratime which is similar. ZFS, which is now the default Solaris filesystem, only has noatime. BSD only appears to have noatime as well. Linux also has nodiratime which isn't available on either Solaris or BSD. Windows varies depending on filesystem type but NTFS may defer updates to atime for up to an hour and you can also disable atime updates entirely. On FAT the resolution of atime is 1 day. Leaving aside the issue of POSIX correctness, unless you implement a probe that understands the semantics of atime updates across all relevant systems and that probes for the mount options of the filesystem you are testing on, any access time behaviour tests are almost certainly going to be unreliable. Rather than trying to bodge up these tests, I believe it's better to remove them entirely. More here: https://en.wikibooks.org/wiki/C_Programming/POSIX_Reference/sys/stat.h/stat#Criticism_of_atime
          Hide
          jzhuge John Zhuge added a comment -

          Besides these actime mount options at OS/kernel layer, the filesystems below (especially network filesystems) favor performance over atime compliance.

          Have we ruled out any problem at JAVA library or JVM layer?

          Show
          jzhuge John Zhuge added a comment - Besides these actime mount options at OS/kernel layer, the filesystems below (especially network filesystems) favor performance over atime compliance. Have we ruled out any problem at JAVA library or JVM layer?
          Hide
          alanburlison Alan Burlison added a comment -

          I traced the JVM and it doesn't seem to be doing anything other than normal stat()-type operations to get file times so I don't think there's a problem there. I think the tests are based on one subset of observed behaviour rather than an understanding of the different ways in which atime may behave. As I said, I believe they should just be removed. Testing ctime and mtime makes sense but atime? No.

          Show
          alanburlison Alan Burlison added a comment - I traced the JVM and it doesn't seem to be doing anything other than normal stat()-type operations to get file times so I don't think there's a problem there. I think the tests are based on one subset of observed behaviour rather than an understanding of the different ways in which atime may behave. As I said, I believe they should just be removed. Testing ctime and mtime makes sense but atime? No.
          Hide
          cnauroth Chris Nauroth added a comment -

          The goal of HADOOP-12045 was to enable setting atime explicitly on the local file system using the FileSystem#setTimes API. It did not get involved at all with the automatic access time update semantics of any particular local file system, nor did the tests include any assumptions that atime is enabled on the local file system. I would agree that our tests cannot make assumptions like that about the local file system.

          The tests in question just need to be able to set an atime value explicitly, and then verify that the atime was changed on the target instead of the link. If we can get cross-platform agreement on this basic level of functionality, then I expect we can keep the tests. There won't be any need for complexity like checking the mount table. I think we just have a corner case bug in that the test's call to FileSystem#getFileLinkStatus might inadvertently update atime of the link once again. The proposed patch solves this by relaxing the assertion to use >=.

          HADOOP-12045 was implemented as a pass-through to BasicFileAttributeView#setTimes, which then appears to be a pass-through to futimes in UnixNativeDispatcher.c in the OpenJDK source. On Windows, it's a pass-through to SetFileTime via WindowsNativeDispatcher.c.

          The Linux man page for futimes says that the function is not specified by a standard and is only implemented for Linux and BSDs. Alan Burlison, do you know if Solaris has futimes? I couldn't find a definitive source on that. As long as BasicFileAttributeView#setTimes somehow works correctly onn Solaris, then I expect we're fine.

          The tests need to be skipped right now on Windows and Solaris due to a difference in native code implementation supported by the Linux side compared to other platforms. I described this in more detail in my comments on HADOOP-12045. HADOOP-11935 is an open issue that would fix this problem by using appropriate NIO APIs to get the stat/lstat behavior we need on those platforms. I don't think we're indiscriminately skipping tests without trying to understand semantics. We are skipping them temporarily until after resolution of HADOOP-11935.

          To summarize, I am +1 for committing patch v02, which retains the tests but corrects for a corner case. Follow-ups are tracked in separate JIRAs. I will hold off on committing though in case there is further discussion.

          Show
          cnauroth Chris Nauroth added a comment - The goal of HADOOP-12045 was to enable setting atime explicitly on the local file system using the FileSystem#setTimes API. It did not get involved at all with the automatic access time update semantics of any particular local file system, nor did the tests include any assumptions that atime is enabled on the local file system. I would agree that our tests cannot make assumptions like that about the local file system. The tests in question just need to be able to set an atime value explicitly, and then verify that the atime was changed on the target instead of the link. If we can get cross-platform agreement on this basic level of functionality, then I expect we can keep the tests. There won't be any need for complexity like checking the mount table. I think we just have a corner case bug in that the test's call to FileSystem#getFileLinkStatus might inadvertently update atime of the link once again. The proposed patch solves this by relaxing the assertion to use >=. HADOOP-12045 was implemented as a pass-through to BasicFileAttributeView#setTimes , which then appears to be a pass-through to futimes in UnixNativeDispatcher.c in the OpenJDK source. On Windows, it's a pass-through to SetFileTime via WindowsNativeDispatcher.c. The Linux man page for futimes says that the function is not specified by a standard and is only implemented for Linux and BSDs. Alan Burlison , do you know if Solaris has futimes ? I couldn't find a definitive source on that. As long as BasicFileAttributeView#setTimes somehow works correctly onn Solaris, then I expect we're fine. The tests need to be skipped right now on Windows and Solaris due to a difference in native code implementation supported by the Linux side compared to other platforms. I described this in more detail in my comments on HADOOP-12045 . HADOOP-11935 is an open issue that would fix this problem by using appropriate NIO APIs to get the stat / lstat behavior we need on those platforms. I don't think we're indiscriminately skipping tests without trying to understand semantics. We are skipping them temporarily until after resolution of HADOOP-11935 . To summarize, I am +1 for committing patch v02, which retains the tests but corrects for a corner case. Follow-ups are tracked in separate JIRAs. I will hold off on committing though in case there is further discussion.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          do we need to care about atime at all, given we explicitly recommend turning atime off on filesystems used for mounting HDFS. Or is it that hdfs itself does atimes, and we want filesystems to be roughly consistent?

          In which case: does anyone have any data on how often atimes are checked in real applications?

          Show
          stevel@apache.org Steve Loughran added a comment - do we need to care about atime at all, given we explicitly recommend turning atime off on filesystems used for mounting HDFS. Or is it that hdfs itself does atimes, and we want filesystems to be roughly consistent? In which case: does anyone have any data on how often atimes are checked in real applications?
          Hide
          alanburlison Alan Burlison added a comment -

          Chris Nauroth thanks for the detailed explanation of the background, that has clarified the issues.

          No, Solaris does not have futimes() but it does have both futimens() and utimensat() which are both POSIX functions, and are both available on Linux. The closes match for futimes() is futimens(), the only difference being that futimes() takes a struct timeval as a parameter and futimens() takes a struct timespec. If you are looking at a JNI solution then futimens() looks like it is the better option.

          However I think NIO will work just fine on Solaris for updating file times, if I trace the operation of java.nio.file.attribute.BasicFileAttributeView.setTimes() I see this:

          /2:     openat(AT_FDCWD, "/tmp/l", O_RDONLY)            = 47
          /2:     uucopy(0xFFFF80FFBCDFC970, 0xFFFF80FFBCDFC910, 32) = 0
          /2:     utimensat(47, NULL, 0xFFFF80FFBCDFC930, 0)      = 0
          /2:     close(47)                                       = 0
          

          So it is already using utimensat().

          Show
          alanburlison Alan Burlison added a comment - Chris Nauroth thanks for the detailed explanation of the background, that has clarified the issues. No, Solaris does not have futimes() but it does have both futimens() and utimensat() which are both POSIX functions, and are both available on Linux. The closes match for futimes() is futimens() , the only difference being that futimes() takes a struct timeval as a parameter and futimens() takes a struct timespec . If you are looking at a JNI solution then futimens() looks like it is the better option. However I think NIO will work just fine on Solaris for updating file times, if I trace the operation of java.nio.file.attribute.BasicFileAttributeView.setTimes() I see this: /2: openat(AT_FDCWD, "/tmp/l" , O_RDONLY) = 47 /2: uucopy(0xFFFF80FFBCDFC970, 0xFFFF80FFBCDFC910, 32) = 0 /2: utimensat(47, NULL, 0xFFFF80FFBCDFC930, 0) = 0 /2: close(47) = 0 So it is already using utimensat() .
          Hide
          alanburlison Alan Burlison added a comment -

          If atimes is usually turned of on filesystems backing HDFS then there seems little point about worrying about it.

          Show
          alanburlison Alan Burlison added a comment - If atimes is usually turned of on filesystems backing HDFS then there seems little point about worrying about it.
          Hide
          cnauroth Chris Nauroth added a comment -

          Alan Burlison, thank you for the clarification on what is available on Solaris. It's another compelling argument for us to use NIO instead of JNI wherever feasible.

          Show
          cnauroth Chris Nauroth added a comment - Alan Burlison , thank you for the clarification on what is available on Solaris. It's another compelling argument for us to use NIO instead of JNI wherever feasible.
          Hide
          cnauroth Chris Nauroth added a comment -

          do we need to care about atime at all, given we explicitly recommend turning atime off on filesystems used for mounting HDFS.

          We also need to keep in mind that Hadoop code can trigger local file system interactions on client machines, external to the Hadoop cluster itself. The standard recommendation to disable atime is less relevant on these client machines. In some environments, these machines might not be governed by the same system administrators that operate the cluster nodes. It might not be feasible for them to enforce a noatime policy everywhere.

          One use case for setting atime on the local file system is hdfs dfs -copyToLocal -p. The expected semantics are to retain the atime stored in HDFS when the file lands on the local file system. Prior to HADOOP-12045, that wouldn't have worked correctly.

          Or is it that hdfs itself does atimes, and we want filesystems to be roughly consistent?

          Yes, I do also think it's good to make the semantics consistent across our file system implementations as much as possible, unless there are compelling technical reasons that prevent it.

          I admit that setting local file system atime through the Hadoop code probably isn't a very critical thing. However, I see it as a nice-to-have based on the copyToLocal use case and the consistency argument. It was enough to motivate me to review and commit HADOOP-12045.

          Show
          cnauroth Chris Nauroth added a comment - do we need to care about atime at all, given we explicitly recommend turning atime off on filesystems used for mounting HDFS. We also need to keep in mind that Hadoop code can trigger local file system interactions on client machines, external to the Hadoop cluster itself. The standard recommendation to disable atime is less relevant on these client machines. In some environments, these machines might not be governed by the same system administrators that operate the cluster nodes. It might not be feasible for them to enforce a noatime policy everywhere. One use case for setting atime on the local file system is hdfs dfs -copyToLocal -p . The expected semantics are to retain the atime stored in HDFS when the file lands on the local file system. Prior to HADOOP-12045 , that wouldn't have worked correctly. Or is it that hdfs itself does atimes, and we want filesystems to be roughly consistent? Yes, I do also think it's good to make the semantics consistent across our file system implementations as much as possible, unless there are compelling technical reasons that prevent it. I admit that setting local file system atime through the Hadoop code probably isn't a very critical thing. However, I see it as a nice-to-have based on the copyToLocal use case and the consistency argument. It was enough to motivate me to review and commit HADOOP-12045 .
          Hide
          stevel@apache.org Steve Loughran added a comment -

          This test is really that the file:// FS handles atime too, especially on windows. It just looks like atime is so inconsistent for symlinks that we should really think about abandoning the test.

          certainly, if I tried to run the test on my dev linux VM, it would fail with noatime.

          If it makes you feel better, I've discovered yesterday that HDFS rename() sets the modtime, but POSIX doesn't...

          Show
          stevel@apache.org Steve Loughran added a comment - This test is really that the file:// FS handles atime too, especially on windows. It just looks like atime is so inconsistent for symlinks that we should really think about abandoning the test. certainly, if I tried to run the test on my dev linux VM, it would fail with noatime. If it makes you feel better, I've discovered yesterday that HDFS rename() sets the modtime, but POSIX doesn't...
          Hide
          cnauroth Chris Nauroth added a comment -

          certainly, if I tried to run the test on my dev linux VM, it would fail with noatime.

          That's not the behavior I'm seeing. I ran the test in a xfs mount point using noatime, and it still passed. I hacked the test to skip deleting the data at the end so I could inspect it manually, and it matched my expectations.

          > pwd
          /home/cnauroth/git/hadoop/hadoop-common-project/hadoop-common
          
          > mount | grep '/home'
          /dev/sdb1 on /home type xfs (rw,noatime,attr2,inode64,noquota)
          
          > tree target/test/data/
          target/test/data/
          └── 4exGmvu2Je
              ├── test1
              │   ├── dir
              │   └── linkToDir -> /home/cnauroth/git/hadoop/hadoop-common-project/hadoop-common/target/test/data/4exGmvu2Je/test1/dir
              └── test2
          
          5 directories, 0 files
          
          > stat -c 'mtime=%Y atime=%X' target/test/data/4exGmvu2Je/test1/dir
          mtime=2 atime=3
          
          > stat -c 'mtime=%Y atime=%X' target/test/data/4exGmvu2Je/test1/linkToDir
          mtime=1449253036 atime=1449253036
          
          > stat -L -c 'mtime=%Y atime=%X' target/test/data/4exGmvu2Je/test1/linkToDir
          mtime=2 atime=3
          

          I can understand the concern about consistency across the whole world of OSes and file systems though. Given that this is not a critical feature, I'm willing to compromise and delete the tests, and let the lack of test coverage slide. Let me know if that's how you'd like to proceed, and we can ask Wei-Chiu Chuang to update the patch.

          If it makes you feel better, I've discovered yesterday that HDFS rename() sets the modtime, but POSIX doesn't...

          Two wrongs make a right.

          Show
          cnauroth Chris Nauroth added a comment - certainly, if I tried to run the test on my dev linux VM, it would fail with noatime. That's not the behavior I'm seeing. I ran the test in a xfs mount point using noatime, and it still passed. I hacked the test to skip deleting the data at the end so I could inspect it manually, and it matched my expectations. > pwd /home/cnauroth/git/hadoop/hadoop-common-project/hadoop-common > mount | grep '/home' /dev/sdb1 on /home type xfs (rw,noatime,attr2,inode64,noquota) > tree target/test/data/ target/test/data/ └── 4exGmvu2Je ├── test1 │   ├── dir │   └── linkToDir -> /home/cnauroth/git/hadoop/hadoop-common-project/hadoop-common/target/test/data/4exGmvu2Je/test1/dir └── test2 5 directories, 0 files > stat -c 'mtime=%Y atime=%X' target/test/data/4exGmvu2Je/test1/dir mtime=2 atime=3 > stat -c 'mtime=%Y atime=%X' target/test/data/4exGmvu2Je/test1/linkToDir mtime=1449253036 atime=1449253036 > stat -L -c 'mtime=%Y atime=%X' target/test/data/4exGmvu2Je/test1/linkToDir mtime=2 atime=3 I can understand the concern about consistency across the whole world of OSes and file systems though. Given that this is not a critical feature, I'm willing to compromise and delete the tests, and let the lack of test coverage slide. Let me know if that's how you'd like to proceed, and we can ask Wei-Chiu Chuang to update the patch. If it makes you feel better, I've discovered yesterday that HDFS rename() sets the modtime, but POSIX doesn't... Two wrongs make a right.
          Hide
          cmccabe Colin P. McCabe added a comment -

          We really should have left out atime from the Hadoop FS interface. It turns all reads into writes.

          Yes, rename() should change the mtime of the parent directory, not the thing being renamed. I guess we got that wrong in HDFS?

          Symlinks technically do have an atime separate from the file they point to, at least on UNIX. I'm not sure about Windows (which only kinda sorta has symlinks in the first place).

          Probably better to @Ignore this test for now.

          Show
          cmccabe Colin P. McCabe added a comment - We really should have left out atime from the Hadoop FS interface. It turns all reads into writes. Yes, rename() should change the mtime of the parent directory, not the thing being renamed. I guess we got that wrong in HDFS? Symlinks technically do have an atime separate from the file they point to, at least on UNIX. I'm not sure about Windows (which only kinda sorta has symlinks in the first place). Probably better to @Ignore this test for now.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          where are we with this? I think we can either commit the latest patch or go with an @ignore. I'd prefer the fix as it still keeps the test running on systems where things behave the way the test expects

          Show
          stevel@apache.org Steve Loughran added a comment - where are we with this? I think we can either commit the latest patch or go with an @ignore. I'd prefer the fix as it still keeps the test running on systems where things behave the way the test expects
          Hide
          cmccabe Colin P. McCabe added a comment -

          I don't have a strong opinion, but I am +1 for the latest patch. An @Ignore would also be acceptable.

          Show
          cmccabe Colin P. McCabe added a comment - I don't have a strong opinion, but I am +1 for the latest patch. An @Ignore would also be acceptable.
          Hide
          cnauroth Chris Nauroth added a comment -

          where are we with this?

          Here is a summary of my position, with quotes pulled from my prior comments.

          Here is where I stated my +1 for patch v02:

          To summarize, I am +1 for committing patch v02, which retains the tests but corrects for a corner case. Follow-ups are tracked in separate JIRAs. I will hold off on committing though in case there is further discussion.

          It sounded like there wasn't consensus though, so I never committed it. I also stated that I was willing to compromise if others feel strongly that the tests need to be deleted:

          I can understand the concern about consistency across the whole world of OSes and file systems though. Given that this is not a critical feature, I'm willing to compromise and delete the tests, and let the lack of test coverage slide. Let me know if that's how you'd like to proceed, and we can ask Wei-Chiu Chuang to update the patch.

          Steve Loughran, what are your thoughts? Proceed with patch v02 or ask for a patch that deletes the tests?

          Show
          cnauroth Chris Nauroth added a comment - where are we with this? Here is a summary of my position, with quotes pulled from my prior comments. Here is where I stated my +1 for patch v02: To summarize, I am +1 for committing patch v02, which retains the tests but corrects for a corner case. Follow-ups are tracked in separate JIRAs. I will hold off on committing though in case there is further discussion. It sounded like there wasn't consensus though, so I never committed it. I also stated that I was willing to compromise if others feel strongly that the tests need to be deleted: I can understand the concern about consistency across the whole world of OSes and file systems though. Given that this is not a critical feature, I'm willing to compromise and delete the tests, and let the lack of test coverage slide. Let me know if that's how you'd like to proceed, and we can ask Wei-Chiu Chuang to update the patch. Steve Loughran , what are your thoughts? Proceed with patch v02 or ask for a patch that deletes the tests?
          Hide
          stevel@apache.org Steve Loughran added a comment -

          well, I don't see the value of this test. But I'm happy for the patch to go in if you are —we can always revisit that decision if it continues to play up.

          +1 then

          Show
          stevel@apache.org Steve Loughran added a comment - well, I don't see the value of this test. But I'm happy for the patch to go in if you are —we can always revisit that decision if it continues to play up. +1 then
          Hide
          cnauroth Chris Nauroth added a comment -

          well, I don't see the value of this test.

          The value I see is that it verifies RawLocalFileSystem is implemented correctly to support hdfs dfs -copyToLocal -p. Again quoting one of my earlier comments:

          One use case for setting atime on the local file system is hdfs dfs -copyToLocal -p. The expected semantics are to retain the atime stored in HDFS when the file lands on the local file system. Prior to HADOOP-12045, that wouldn't have worked correctly.

          Thanks, everyone. I'll plan on committing this later.

          Show
          cnauroth Chris Nauroth added a comment - well, I don't see the value of this test. The value I see is that it verifies RawLocalFileSystem is implemented correctly to support hdfs dfs -copyToLocal -p . Again quoting one of my earlier comments: One use case for setting atime on the local file system is hdfs dfs -copyToLocal -p . The expected semantics are to retain the atime stored in HDFS when the file lands on the local file system. Prior to HADOOP-12045 , that wouldn't have worked correctly. Thanks, everyone. I'll plan on committing this later.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Thanks everyone for the discussion and the decision. Learned a lot in this process!

          Show
          jojochuang Wei-Chiu Chuang added a comment - Thanks everyone for the discussion and the decision. Learned a lot in this process!
          Hide
          cnauroth Chris Nauroth added a comment -

          I have committed this to trunk, branch-2 and branch-2.8. Thank you to everyone who participated.

          Show
          cnauroth Chris Nauroth added a comment - I have committed this to trunk, branch-2 and branch-2.8. Thank you to everyone who participated.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9098 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9098/)
          HADOOP-12603. TestSymlinkLocalFSFileContext#testSetTimesSymlinkToDir (cnauroth: rev fbb5868deb6a714c548887f3b42119b256d9d69b)

          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestSymlinkLocalFS.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/SymlinkBaseTest.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9098 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9098/ ) HADOOP-12603 . TestSymlinkLocalFSFileContext#testSetTimesSymlinkToDir (cnauroth: rev fbb5868deb6a714c548887f3b42119b256d9d69b) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestSymlinkLocalFS.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/SymlinkBaseTest.java hadoop-common-project/hadoop-common/CHANGES.txt

            People

            • Assignee:
              jojochuang Wei-Chiu Chuang
              Reporter:
              jojochuang Wei-Chiu Chuang
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development