Hadoop Common
  1. Hadoop Common
  2. HADOOP-6787

Factor out glob pattern code from FileContext and Filesystem

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: fs
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Tags:
      glob pattern

      Description

      Refactor the glob pattern code out of FileContext and FileSystem into a package private GlobFilter and the reusable GlobPattern class (InterfaceAudience.Private)

      Also fix the handling of ^ outside character class ([...]) reported in HADOOP-6618 and make the glob pattern code less restrictive (not throwing on some valid glob patterns.) and more POSIX standard compliant (support [!...]).

      1. hadoop-6787-trunk-v1.patch
        19 kB
        Luke Lu
      2. hadoop-6787-trunk-v2.patch
        19 kB
        Luke Lu
      3. hadoop-6787-y20s-v1.patch
        14 kB
        Luke Lu

        Issue Links

          Activity

          Hide
          Luke Lu added a comment -

          Patches for y20s branch.

          Show
          Luke Lu added a comment - Patches for y20s branch.
          Hide
          Luke Lu added a comment -

          Ported to trunk

          Show
          Luke Lu added a comment - Ported to trunk
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12446049/hadoop-6787-trunk-v1.patch
          against trunk revision 949658.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/71/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/71/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/71/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/71/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446049/hadoop-6787-trunk-v1.patch against trunk revision 949658. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/71/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/71/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/71/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/71/console This message is automatically generated.
          Hide
          Eli Collins added a comment -

          Looks good.

          Nit: not your change but public GlobFilter methods need javadocs, spaces in between the end of a method and start of the next.

          Show
          Eli Collins added a comment - Looks good. Nit: not your change but public GlobFilter methods need javadocs, spaces in between the end of a method and start of the next.
          Hide
          Luke Lu added a comment -

          Thanks for the review, Eli. Not sure I completely follow your comment. The GlobFilter class is not a public class. The visibility is package only except for the accept method from the PathFilter interface which has javadoc.

          Show
          Luke Lu added a comment - Thanks for the review, Eli. Not sure I completely follow your comment. The GlobFilter class is not a public class. The visibility is package only except for the accept method from the PathFilter interface which has javadoc.
          Hide
          Eli Collins added a comment -

          You're right, you don't need full javadocs for those methods, just suggesting a comment explaining what the public methods do would be useful, but feel free to defer.

          +1 for the patch.

          Show
          Eli Collins added a comment - You're right, you don't need full javadocs for those methods, just suggesting a comment explaining what the public methods do would be useful, but feel free to defer. +1 for the patch.
          Hide
          Luke Lu added a comment -

          Thanks for the +1!

          Most IDEs would indicate (in my case, with a green circle with I in it) an implementing method. Clicking or hovering on the indicator would show you the javadoc of the interface method. I personally prefer keeping the javadocs in interfaces/abstract classes and not duplicating them in implementing classes. I hope that it's not a bug in Hudson's javadoc tool to +1 my patch as well

          Show
          Luke Lu added a comment - Thanks for the +1! Most IDEs would indicate (in my case, with a green circle with I in it) an implementing method. Clicking or hovering on the indicator would show you the javadoc of the interface method. I personally prefer keeping the javadocs in interfaces/abstract classes and not duplicating them in implementing classes. I hope that it's not a bug in Hudson's javadoc tool to +1 my patch as well
          Hide
          Eli Collins added a comment -

          Thanks Luke. I will commit this soon.

          Show
          Eli Collins added a comment - Thanks Luke. I will commit this soon.
          Hide
          Jakob Homan added a comment -

          Currently the GlobFilter classes are private to their outer classes and this patch makes the new, remaining class public. Is this necessary, or can we ratchet down the visibility to protected? What's the intended audience and stability annotations? I'd rather not make this a supported class if we don't have to and it seems pretty internal to HDFS.

          Show
          Jakob Homan added a comment - Currently the GlobFilter classes are private to their outer classes and this patch makes the new, remaining class public. Is this necessary, or can we ratchet down the visibility to protected? What's the intended audience and stability annotations? I'd rather not make this a supported class if we don't have to and it seems pretty internal to HDFS.
          Hide
          Jakob Homan added a comment -

          And of course by protected, I meant package private... doh.

          Show
          Jakob Homan added a comment - And of course by protected, I meant package private... doh.
          Hide
          Jakob Homan added a comment -

          For the love of Pete... and I meant GlobPattern. If the metrics filter configuration you mentioned above isn't within the fs package, I imagine this is the best we'll do for limiting access to this new class.

          Show
          Jakob Homan added a comment - For the love of Pete... and I meant GlobPattern. If the metrics filter configuration you mentioned above isn't within the fs package, I imagine this is the best we'll do for limiting access to this new class.
          Hide
          Eli Collins added a comment -

          I agree Jakob, GlobPattern should be package private like GlobFilter. The annotations should be the same as other package private hadoop.fs classes. Globbing is not internal to HDFS it's purely client side, used by FsShell for any file system.

          Show
          Eli Collins added a comment - I agree Jakob, GlobPattern should be package private like GlobFilter. The annotations should be the same as other package private hadoop.fs classes. Globbing is not internal to HDFS it's purely client side, used by FsShell for any file system.
          Hide
          Luke Lu added a comment -

          The main incentive for the refactor is to make the GlobPattern reusable by other components (including but not limited to metrics configuration) in other packages (e.g., org.apache.hadoop.metrics). I proposed in private to make it reside in org.apache.hadoop.util as it's really a general utility (a simplified and less powerful (than regex to shoot people's foot easily) pattern matching mechanism, as found in many other languages.), and met with strong resistance. So it stays in the fs package, where it comes from.

          It seems to me that It should remain as a java public class, as it's needed by other packages, but be annotated as audience private.

          Show
          Luke Lu added a comment - The main incentive for the refactor is to make the GlobPattern reusable by other components (including but not limited to metrics configuration) in other packages (e.g., org.apache.hadoop.metrics). I proposed in private to make it reside in org.apache.hadoop.util as it's really a general utility (a simplified and less powerful (than regex to shoot people's foot easily) pattern matching mechanism, as found in many other languages.), and met with strong resistance. So it stays in the fs package, where it comes from. It seems to me that It should remain as a java public class, as it's needed by other packages, but be annotated as audience private.
          Hide
          Eli Collins added a comment -

          Agree that it's better left in o.a.h.fs. GlobPattern should be package protection though since it's only used by o.a.h.fs right?

          Show
          Eli Collins added a comment - Agree that it's better left in o.a.h.fs. GlobPattern should be package protection though since it's only used by o.a.h.fs right?
          Hide
          Luke Lu added a comment -

          I want to reuse GlobPattern in o.a.h.metrics as well. Annotate it with InterfaceAudience.Private and InterfaceStability.Evolving is probably most appropriate.

          Show
          Luke Lu added a comment - I want to reuse GlobPattern in o.a.h.metrics as well. Annotate it with InterfaceAudience.Private and InterfaceStability.Evolving is probably most appropriate.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          @Luke, looks like that this is not only a code refactoring but also bug fixes (or behavior changes.) Could you describe the changes in details?

          Show
          Tsz Wo Nicholas Sze added a comment - @Luke, looks like that this is not only a code refactoring but also bug fixes (or behavior changes.) Could you describe the changes in details?
          Hide
          Luke Lu added a comment -

          Yes, it does include some fixes. Summary of changes:

          • Fixed ^ usage outside [...] (HADOOP-6618)
          • Made the pattern more POSIX compliant: support [!...] as well as [^...]
          • Made it to generate less garbage while matching by using non-capturing group.
          • Removed some unnecessary checks that would throw on valid POSIX glob patterns.
          • Added more comprehensive unit tests for the glob pattern matching code.
          Show
          Luke Lu added a comment - Yes, it does include some fixes. Summary of changes: Fixed ^ usage outside [...] ( HADOOP-6618 ) Made the pattern more POSIX compliant: support [!...] as well as [^...] Made it to generate less garbage while matching by using non-capturing group. Removed some unnecessary checks that would throw on valid POSIX glob patterns. Added more comprehensive unit tests for the glob pattern matching code.
          Hide
          Luke Lu added a comment -

          In short, it's strictly less restrictive and more standard compliant than the old code. The TestGlobPaths test in HDFS passes unmodified in my environment with patched common package as well (it didn't until I wrapped the PatternSyntaxException with IOException that start with "Illegal file pattern" in the GlobFilter.)

          Show
          Luke Lu added a comment - In short, it's strictly less restrictive and more standard compliant than the old code. The TestGlobPaths test in HDFS passes unmodified in my environment with patched common package as well (it didn't until I wrapped the PatternSyntaxException with IOException that start with "Illegal file pattern" in the GlobFilter.)
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The changes sound reasonable.

          Show
          Tsz Wo Nicholas Sze added a comment - The changes sound reasonable.
          Hide
          Eli Collins added a comment -

          I think the splitting up into GlobPattern and GlobFilter makes sense. A change like this probably should be split into two, one that's just code motion and one that changes the behavior. My bad, I should have pointed that out. Do we need to do this as two separate jiras?

          Show
          Eli Collins added a comment - I think the splitting up into GlobPattern and GlobFilter makes sense. A change like this probably should be split into two, one that's just code motion and one that changes the behavior. My bad, I should have pointed that out. Do we need to do this as two separate jiras?
          Hide
          Jakob Homan added a comment -

          I want to reuse GlobPattern in o.a.h.metrics as well. Annotate it with InterfaceAudience.Private and InterfaceStability.Evolving is probably most appropriate.

          Sounds good. Thanks.

          Show
          Jakob Homan added a comment - I want to reuse GlobPattern in o.a.h.metrics as well. Annotate it with InterfaceAudience.Private and InterfaceStability.Evolving is probably most appropriate. Sounds good. Thanks.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > ... Do we need to do this as two separate jiras?

          One jira probably is okay unless there is a need to split it to two.

          We should emphasize that this is a bug fix and an incompatible change. Code refactoring is a minor improvement.

          Show
          Tsz Wo Nicholas Sze added a comment - > ... Do we need to do this as two separate jiras? One jira probably is okay unless there is a need to split it to two. We should emphasize that this is a bug fix and an incompatible change. Code refactoring is a minor improvement.
          Hide
          Luke Lu added a comment -

          v2 trunk patch added annotation InterfaceAudience.Private and InterfaceStability.Evolving to the GlobPattern class.

          Show
          Luke Lu added a comment - v2 trunk patch added annotation InterfaceAudience.Private and InterfaceStability.Evolving to the GlobPattern class.
          Hide
          Eli Collins added a comment -

          +1 lgtm. Confirmed that we only need the annotations for public classes. Mind updating the jira type and summary to reflect the bug fixes?

          Show
          Eli Collins added a comment - +1 lgtm. Confirmed that we only need the annotations for public classes. Mind updating the jira type and summary to reflect the bug fixes?
          Hide
          Luke Lu added a comment -

          Done.

          Show
          Luke Lu added a comment - Done.
          Hide
          Eli Collins added a comment -

          Thanks Luke. Kicking hudson with the new patch.

          Show
          Eli Collins added a comment - Thanks Luke. Kicking hudson with the new patch.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12446176/hadoop-6787-trunk-v2.patch
          against trunk revision 951081.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/560/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/560/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/560/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/560/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446176/hadoop-6787-trunk-v2.patch against trunk revision 951081. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/560/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/560/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/560/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/560/console This message is automatically generated.
          Hide
          Eli Collins added a comment -

          I've commited this. Thanks Luke!

          Show
          Eli Collins added a comment - I've commited this. Thanks Luke!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #276 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/276/)
          HADOOP-6787. Factor out glob pattern code from FileContext and FileSystem. Contributed by Luke Lu.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #276 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/276/ ) HADOOP-6787 . Factor out glob pattern code from FileContext and FileSystem. Contributed by Luke Lu.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          @Eli, do you think that this is an incompatible change?

          Show
          Tsz Wo Nicholas Sze added a comment - @Eli, do you think that this is an incompatible change?
          Hide
          Eli Collins added a comment -

          Refactoring out the internal classes doesn't seem incompatible. I suppose the bug fixes could affect a program eg that expected a bogus exception on a valid POSIX glob pattern, so I'll put the flag back.

          Show
          Eli Collins added a comment - Refactoring out the internal classes doesn't seem incompatible. I suppose the bug fixes could affect a program eg that expected a bogus exception on a valid POSIX glob pattern, so I'll put the flag back.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #356 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/356/)
          HADOOP-6787. Factor out glob pattern code from FileContext and FileSystem. Contributed by Luke Lu.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #356 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/356/ ) HADOOP-6787 . Factor out glob pattern code from FileContext and FileSystem. Contributed by Luke Lu.

            People

            • Assignee:
              Luke Lu
              Reporter:
              Luke Lu
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development