Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1358

Utils.OutputLogFilter incorrectly filters for _logs

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      OutputLogFilter checks if the path contains _logs. This would incorrectly filter out all contents of a directory called server_logs, for example. Instead it should check for a path component exactly equal to _logs

      1. mapreduce-1358.txt
        3 kB
        Todd Lipcon
      2. mapreduce-1358.txt
        3 kB
        Todd Lipcon
      3. mapreduce-1358.txt
        3 kB
        Todd Lipcon

        Activity

        Todd Lipcon created issue -
        Hide
        Todd Lipcon added a comment -

        Simple patch plus test case

        Show
        Todd Lipcon added a comment - Simple patch plus test case
        Todd Lipcon made changes -
        Field Original Value New Value
        Attachment mapreduce-1358.txt [ 12429496 ]
        Todd Lipcon made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12429496/mapreduce-1358.txt
        against trunk revision 896265.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/248/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/248/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/248/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/248/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12429496/mapreduce-1358.txt against trunk revision 896265. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/248/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/248/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/248/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/248/console This message is automatically generated.
        Hide
        Amar Kamat added a comment -

        Wouldn't the following change be sufficient?

        return !(path.toString().contains("_logs"));
        

        be

        return !("_logs".equals(path.name()));
        
        Show
        Amar Kamat added a comment - Wouldn't the following change be sufficient? return !(path.toString().contains( "_logs" )); be return !( "_logs" .equals(path.name()));
        Hide
        Todd Lipcon added a comment -

        Amar: I don't think so - that wouldn't filter a file like /foo/_logs/bar - right? I don't think there's any contract that PathFilter is called in a pre-order traversal, so filtering /foo/_logs doesn't guarantee that /foo/_logs/bar will get automatically filtered.

        Show
        Todd Lipcon added a comment - Amar: I don't think so - that wouldn't filter a file like /foo/_logs/bar - right? I don't think there's any contract that PathFilter is called in a pre-order traversal, so filtering /foo/_logs doesn't guarantee that /foo/_logs/bar will get automatically filtered.
        Todd Lipcon made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Todd Lipcon added a comment -

        Resubmitting same patch - I don't think the test timeout was related, but just to be safe.

        Show
        Todd Lipcon added a comment - Resubmitting same patch - I don't think the test timeout was related, but just to be safe.
        Todd Lipcon made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Amar Kamat added a comment -

        Todd, I think as per the documentation, Utils.OutputLogFilter() is supposed to filter out job-log-dir under the job-output-dir as opposed to detect if the path belongs to some log folder. The example given in the documentation is

        Path[] fileList = FileUtil.stat2Paths(fs.listStatus(outDir, new OutputFilesFilter()));
        

        So, if passed a value /foo/_logs/bar then it essentially means that bar is a directory in the job output folder /foo/_logs/.

        Show
        Amar Kamat added a comment - Todd, I think as per the documentation, Utils.OutputLogFilter() is supposed to filter out job-log-dir under the job-output-dir as opposed to detect if the path belongs to some log folder. The example given in the documentation is Path[] fileList = FileUtil.stat2Paths(fs.listStatus(outDir, new OutputFilesFilter())); So, if passed a value /foo/_logs/bar then it essentially means that bar is a directory in the job output folder /foo/_logs/ .
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12429496/mapreduce-1358.txt
        against trunk revision 899501.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/390/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/390/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/390/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/390/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12429496/mapreduce-1358.txt against trunk revision 899501. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/390/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/390/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/390/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/390/console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        OK, you convinced me. New patch uses your suggested implementation.

        Show
        Todd Lipcon added a comment - OK, you convinced me. New patch uses your suggested implementation.
        Todd Lipcon made changes -
        Attachment mapreduce-1358.txt [ 12434621 ]
        Todd Lipcon made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Todd Lipcon made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Amar Kamat added a comment -

        The patch looks good. Should we add a unit test for Utils.OutputLogFilter too?

        Show
        Amar Kamat added a comment - The patch looks good. Should we add a unit test for Utils.OutputLogFilter too?
        Hide
        Todd Lipcon added a comment -

        The existing test covers both classes since one delegates to the other, no? I figured it was simple enough to cover them with a single test (previously they were covered by no tests!)

        Show
        Todd Lipcon added a comment - The existing test covers both classes since one delegates to the other, no? I figured it was simple enough to cover them with a single test (previously they were covered by no tests!)
        Hide
        Amar Kamat added a comment -

        The reason why I asked for separate testcases for Utils.OutputFilesFilter and Utils.OutputLogFilter is that there is an implicit assumption that OutputFilesFilter extends OutputLogFilter. Both the classes are free to change their implementation of accept(Path) hence we need testcases for both. I agree with Todd that it looks redundant as of now but will safeguard us from future mistakes. I had an offline discussion with Todd and he agreed with me on adding separate testcases for both the utilities.

        Show
        Amar Kamat added a comment - The reason why I asked for separate testcases for Utils.OutputFilesFilter and Utils.OutputLogFilter is that there is an implicit assumption that OutputFilesFilter extends OutputLogFilter. Both the classes are free to change their implementation of accept(Path) hence we need testcases for both. I agree with Todd that it looks redundant as of now but will safeguard us from future mistakes. I had an offline discussion with Todd and he agreed with me on adding separate testcases for both the utilities.
        Todd Lipcon made changes -
        Attachment mapreduce-1358.txt [ 12434708 ]
        Todd Lipcon made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Todd Lipcon made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12434708/mapreduce-1358.txt
        against trunk revision 906228.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/432/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/432/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/432/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/432/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12434708/mapreduce-1358.txt against trunk revision 906228. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/432/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/432/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/432/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/432/console This message is automatically generated.
        Hide
        Amar Kamat added a comment -

        +1. Looks good to me.

        Show
        Amar Kamat added a comment - +1. Looks good to me.
        Hide
        Chris Douglas added a comment -

        +1

        I committed this. Thanks, Todd!

        Thanks Amar for iterating on this, as well.

        Show
        Chris Douglas added a comment - +1 I committed this. Thanks, Todd! Thanks Amar for iterating on this, as well.
        Chris Douglas made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Fix Version/s 0.22.0 [ 12314184 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #239 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/239/)
        . Avoid false positives in OutputLogFilter. Contributed by Todd Lipcon

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #239 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/239/ ) . Avoid false positives in OutputLogFilter. Contributed by Todd Lipcon
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #234 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/234/)
        . Avoid false positives in OutputLogFilter. Contributed by Todd Lipcon

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #234 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/234/ ) . Avoid false positives in OutputLogFilter. Contributed by Todd Lipcon
        Tom White made changes -
        Fix Version/s 0.21.0 [ 12314045 ]
        Fix Version/s 0.22.0 [ 12314184 ]
        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        28d 18h 24m 3 Todd Lipcon 03/Feb/10 20:49
        Open Open Patch Available Patch Available
        36m 15s 4 Todd Lipcon 03/Feb/10 20:49
        Patch Available Patch Available Resolved Resolved
        9d 15h 7m 1 Chris Douglas 13/Feb/10 11:57
        Resolved Resolved Closed Closed
        192d 9h 22m 1 Tom White 24/Aug/10 22:19

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development