Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-4890

FileInputFormatTest#testExcludeFiles fails on Windows OS

    Details

      Description

      Running the mentioned test leads to an exception:

      Illegal char <:> at index 2: /C:/dev/cygwin64/tmp/junit3838395086498044255/another_file.bin
      java.nio.file.InvalidPathException: Illegal char <:> at index 2: /C:/dev/cygwin64/tmp/junit3838395086498044255/anot                                                                                                                     her_file.bin
              at sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182)
              at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153)
              at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77)
              at sun.nio.fs.WindowsPath.parse(WindowsPath.java:94)
              at sun.nio.fs.WindowsFileSystem.getPath(WindowsFileSystem.java:255)
              at java.nio.file.Paths.get(Paths.java:84)
              at org.apache.flink.api.common.io.GlobFilePathFilter.filterPath(GlobFilePathFilter.java:95)
              at org.apache.flink.api.common.io.FileInputFormat.acceptFile(FileInputFormat.java:644)
              at org.apache.flink.api.common.io.FileInputFormat.addFilesInDir(FileInputFormat.java:600)
              at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:476)
              at org.apache.flink.api.common.io.FileInputFormatTest.testReadMultiplePatterns(FileInputFormatTest.java:362)
      

      The problem is that we are given a flink Path, which is then converted to a String and given to the nio FIleSystem. The passed path is thus /C:/..., which nio can't work with.

        Issue Links

          Activity

          Hide
          Boris_Osipov Boris Osipov added a comment -

          I've reproduced this issue on my environment.
          Also, I found that AvroOutputFormatTest#testCompression fails on Windows by the same reason.

          I'd like to fix both problems. There are two ways to do it:
          1) Expand Flink Path with a new method for converting windows path to nio path(just remove the leading slash)
          2) Handle and convert windows path's in GlobFilePathFilter and in other places

          What should I do? Could you please assign issue to me?

          Show
          Boris_Osipov Boris Osipov added a comment - I've reproduced this issue on my environment. Also, I found that AvroOutputFormatTest#testCompression fails on Windows by the same reason. I'd like to fix both problems. There are two ways to do it: 1) Expand Flink Path with a new method for converting windows path to nio path(just remove the leading slash) 2) Handle and convert windows path's in GlobFilePathFilter and in other places What should I do? Could you please assign issue to me?
          Hide
          Boris_Osipov Boris Osipov added a comment -

          It's related to FLINK-3677

          Show
          Boris_Osipov Boris Osipov added a comment - It's related to FLINK-3677
          Hide
          StephanEwen Stephan Ewen added a comment -

          Aside from the issues on local windows machines:

          I am wondering if the complete approach of the GlobFilePathFilter is not fundamentally flawed.
          It always works based what the machine's standard OS FileSystem type is. So, a windows machine running a Flink JobManager trying to filter HDFS files (HDFS being more unix like) can simply not work.

          What do others think?

          Show
          StephanEwen Stephan Ewen added a comment - Aside from the issues on local windows machines: I am wondering if the complete approach of the GlobFilePathFilter is not fundamentally flawed. It always works based what the machine's standard OS FileSystem type is. So, a windows machine running a Flink JobManager trying to filter HDFS files (HDFS being more unix like) can simply not work. What do others think?
          Hide
          Zentol Chesnay Schepler added a comment - - edited

          It does kind of work.

          When matching paths we ignore the schema of the path; so given an include pattern "/*/text" a path like "hdfs:/my/text" would be accepted. (since it is reduced to "/my/text" which matches the pattern)

          However, if the pattern would be "hdfs:/*/text" then the matching would fail; but not just on windows but on every OS.

          If we were to fix this however, and include the schema in the matching process, then it would indeed fail by default on Windows for anything but local paths.

          Show
          Zentol Chesnay Schepler added a comment - - edited It does kind of work. When matching paths we ignore the schema of the path; so given an include pattern "/*/text" a path like "hdfs:/my/text" would be accepted. (since it is reduced to "/my/text" which matches the pattern) However, if the pattern would be "hdfs:/*/text" then the matching would fail; but not just on windows but on every OS. If we were to fix this however, and include the schema in the matching process, then it would indeed fail by default on Windows for anything but local paths.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user StephanEwen opened a pull request:

          https://github.com/apache/flink/pull/3064

          FLINK-4890 [core] Make GlobFilePathFilter work on Windows

          When extracting the file path from the Flink `Path` object, this de-slashifies the path, if it is a slashed windows path.

          This is analogous to the logic in `Path.toString()`.

          The commit contains a minor cleanup in `FilePathFilter` to improve comments and fix serializability warnings.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/StephanEwen/incubator-flink glob_win

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3064.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3064


          commit 2d099cc2f3e18b081ba59ca863a562be636cab7a
          Author: Stephan Ewen <sewen@apache.org>
          Date: 2017-01-05T13:44:00Z

          FLINK-4890 [core] Make GlobFilePathFilter work on Windows


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/3064 FLINK-4890 [core] Make GlobFilePathFilter work on Windows When extracting the file path from the Flink `Path` object, this de-slashifies the path, if it is a slashed windows path. This is analogous to the logic in `Path.toString()`. The commit contains a minor cleanup in `FilePathFilter` to improve comments and fix serializability warnings. You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink glob_win Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3064.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3064 commit 2d099cc2f3e18b081ba59ca863a562be636cab7a Author: Stephan Ewen <sewen@apache.org> Date: 2017-01-05T13:44:00Z FLINK-4890 [core] Make GlobFilePathFilter work on Windows
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on the issue:

          https://github.com/apache/flink/pull/3064

          I can confirm that the test no longer fails on Windows.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/3064 I can confirm that the test no longer fails on Windows.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3064

          Okay, will merge this then.

          BTW: I think the include/exclude logic is faulty. If a path matches no include pattern, it is still accepted.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3064 Okay, will merge this then. BTW: I think the include/exclude logic is faulty. If a path matches no include pattern, it is still accepted.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on the issue:

          https://github.com/apache/flink/pull/3064

          If no include filter matches then filterPath should return true, in which case it is excluded.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/3064 If no include filter matches then filterPath should return true, in which case it is excluded.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3064#discussion_r94788199

          — Diff: flink-core/src/main/java/org/apache/flink/core/fs/Path.java —
          @@ -274,24 +274,6 @@ private String normalizePath(String path) {
          }

          /**

          • * Checks if the provided path string contains a windows drive letter.
          • *
          • * @param path
          • * the path to check
          • * @param slashed
          • * <code>true</code> to indicate the first character of the string is a slash, <code>false</code> otherwise
          • * @return <code>true</code> if the path string contains a windows drive letter, <code>false</code> otherwise
          • */
          • private boolean hasWindowsDrive(String path, boolean slashed) {
              • End diff –

          apparently this breaks binary compatibility.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/3064#discussion_r94788199 — Diff: flink-core/src/main/java/org/apache/flink/core/fs/Path.java — @@ -274,24 +274,6 @@ private String normalizePath(String path) { } /** * Checks if the provided path string contains a windows drive letter. * * @param path * the path to check * @param slashed * <code>true</code> to indicate the first character of the string is a slash, <code>false</code> otherwise * @return <code>true</code> if the path string contains a windows drive letter, <code>false</code> otherwise */ private boolean hasWindowsDrive(String path, boolean slashed) { End diff – apparently this breaks binary compatibility.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3064

          You are right, the logic is the other way around than a predicate like in `FilterFunction`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3064 You are right, the logic is the other way around than a predicate like in `FilterFunction`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3064#discussion_r94791293

          — Diff: flink-core/src/main/java/org/apache/flink/core/fs/Path.java —
          @@ -274,24 +274,6 @@ private String normalizePath(String path) {
          }

          /**

          • * Checks if the provided path string contains a windows drive letter.
          • *
          • * @param path
          • * the path to check
          • * @param slashed
          • * <code>true</code> to indicate the first character of the string is a slash, <code>false</code> otherwise
          • * @return <code>true</code> if the path string contains a windows drive letter, <code>false</code> otherwise
          • */
          • private boolean hasWindowsDrive(String path, boolean slashed) {
              • End diff –

          It does not really break it (previously private methods should not be considered), but the plugin complains anyways.
          Will slightly rewrite this to make the plugin happy.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/3064#discussion_r94791293 — Diff: flink-core/src/main/java/org/apache/flink/core/fs/Path.java — @@ -274,24 +274,6 @@ private String normalizePath(String path) { } /** * Checks if the provided path string contains a windows drive letter. * * @param path * the path to check * @param slashed * <code>true</code> to indicate the first character of the string is a slash, <code>false</code> otherwise * @return <code>true</code> if the path string contains a windows drive letter, <code>false</code> otherwise */ private boolean hasWindowsDrive(String path, boolean slashed) { End diff – It does not really break it (previously private methods should not be considered), but the plugin complains anyways. Will slightly rewrite this to make the plugin happy.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3064

          Manually merging...

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3064 Manually merging...
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen closed the pull request at:

          https://github.com/apache/flink/pull/3064

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen closed the pull request at: https://github.com/apache/flink/pull/3064
          Hide
          StephanEwen Stephan Ewen added a comment -

          Fixed in

          • 1.2.0 via fb48c3b4cbc5a186cb7b812c8d05833c5852b385
          • 1.3.0 via 15bd1f128f3d2ae1cad68e89355f9c5045b8e830
          Show
          StephanEwen Stephan Ewen added a comment - Fixed in 1.2.0 via fb48c3b4cbc5a186cb7b812c8d05833c5852b385 1.3.0 via 15bd1f128f3d2ae1cad68e89355f9c5045b8e830

            People

            • Assignee:
              StephanEwen Stephan Ewen
              Reporter:
              Zentol Chesnay Schepler
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development