Uploaded image for project: 'Groovy'
  1. Groovy
  2. GROOVY-6457

File.eachFileMatch is inconsistent with File.eachFile and incurs extra stat() syscalls

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.2.1
    • 2.2.2
    • groovy-jdk
    • None

    Description

      ResourceGroovyMethods#eachFileMatch iterates over each file/directory in a directory, while ResourceGroovyMethods#eachFile instead iterates over every directory entry, including things which aren't files or directories (e.g., fifos and sockets). eachFileMatch should behave similar to eachFile

      Here's a demonstration showing that the methods return different results when iterating over everything:

      [ecd@qk /tmp/tmp12345]$ ls -alF
      total 36
      drwxr-xr-x    2 ecd   wheel    512 Nov 27 20:46 ./
      drwxrwxrwt  269 root  wheel  32256 Nov 27 20:20 ../
      -rw-r--r--    1 ecd   wheel      0 Nov 27 20:13 .hidden
      -rw-r--r--    1 ecd   wheel      0 Nov 27 20:13 bar
      prw-r--r--    1 ecd   wheel      0 Nov 27 20:16 fifo|
      -rw-r--r--    1 ecd   wheel      0 Nov 27 20:13 foo1
      -rw-r--r--    1 ecd   wheel      0 Nov 27 20:13 foo2
      -rw-r--r--    1 ecd   wheel      0 Nov 27 20:13 foo3
      -rw-r--r--    1 ecd   wheel      0 Nov 27 20:13 foo4
      -rw-r--r--    1 ecd   wheel      0 Nov 27 20:13 foo5
      lrwxr-xr-x    1 ecd   wheel      4 Nov 27 20:46 symlink@ -> foo3
      [ecd@qk /tmp/tmp12345]$ groovysh
      Groovy Shell (2.2.1, JVM: 1.7.0_25)
      Type 'help' or '\h' for help.
      ------------------------------------------------------------------------------------------------------------------------------------------------------
      groovy:000> i = 0
      ===> 0
      groovy:000> new File(".").eachFile { ++i }
      ===> null
      groovy:000> i
      ===> 9
      groovy:000> j = 0
      ===> 0
      groovy:000> new File(".").eachFileMatch(~/.*/) { ++j }
      ===> null
      groovy:000> j
      ===> 8
      

      Furthermore, due to the lack of a check in eachFileMatch for the FileType.ANY case, it causes File.isFile() and File.isDirectory() to be called for every directory entry. This can be a significant performance impact particularly on slow NFS mounts in directories with thousands of files.

      The extra stat syscalls can be seen with another demo, using the same directory structure as above:

      class FileTests {
      
          def basic() { // 0 stats
              for (File f : new File("/tmp/tmp12345").listFiles()) { }
          }
      
          def groovyEachFile() {
              new File("/tmp/tmp12345").eachFile { }
          }
      
          def groovyEachFileMatch() {
              new File("/tmp/tmp12345").eachFileMatch(~/.*/) { }
          }
      
          def groovyEachFileRecurse() {
              new File("/tmp/tmp12345").eachFileRecurse { }
          }
      
          public static void main(String[] args) {
              FileTests fileTests = new FileTests();
              FileTests.getMethod(args[0]).invoke(fileTests);
          }
      }
      

      and truss/strace/dtrace will show the system calls for each:

      [ecd@qk ~]$ (for m in basic groovyEachFile groovyEachFileMatch groovyEachFileRecurse; do echo $m $(truss groovy FileTests.groovy $m 2>&1 | grep tmp12345 | grep -c stat); done) | column -t
      basic                  0
      groovyEachFile         2
      groovyEachFileMatch    20
      groovyEachFileRecurse  11
      

      Attachments

        Activity

          People

            paulk Paul King
            ericdahl Eric Dahl
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: