Hadoop Common
  1. Hadoop Common
  2. HADOOP-5087

Regex for Cmd parsing contains an error

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      What is new in HADOOP-5087:

      - Fixed the regex to correctly parse the chukwa agent ADD command.
      - Added a test case to validate the chukwa agent ADD command.

      Show
      What is new in HADOOP-5087 : - Fixed the regex to correctly parse the chukwa agent ADD command. - Added a test case to validate the chukwa agent ADD command.
    1. fixedregex.patch
      9 kB
      Ari Rabkin
    2. fixftaregex.patch
      1 kB
      Ari Rabkin
    3. HADOOP-5087.patch
      6 kB
      Jerome Boulon
    4. HADOOP-5087-2.patch
      9 kB
      Jerome Boulon
    5. reluctantregex.patch
      9 kB
      Ari Rabkin

      Issue Links

        Activity

        Hide
        Jerome Boulon added a comment -
        • Fix the regex to correctly parse the ADD command
        • Add a test case to validate the ADD command
        Show
        Jerome Boulon added a comment - Fix the regex to correctly parse the ADD command Add a test case to validate the ADD command
        Hide
        Jerome Boulon added a comment -

        Ps: The ChukwaTestAdaptor could be useful for testing the AdaptorManager implementation
        https://issues.apache.org/jira/browse/HADOOP-4893?focusedCommentId=12663916#action_12663916

        Show
        Jerome Boulon added a comment - Ps: The ChukwaTestAdaptor could be useful for testing the AdaptorManager implementation https://issues.apache.org/jira/browse/HADOOP-4893?focusedCommentId=12663916#action_12663916
        Hide
        Ari Rabkin added a comment -

        Does this regex work if an adaptor has no parameters? Do we have a test case to cover this?

        Show
        Ari Rabkin added a comment - Does this regex work if an adaptor has no parameters? Do we have a test case to cover this?
        Hide
        Jerome Boulon added a comment -

        Yes, HADOOP-5087-2.patch now contains 2 additional test case:

        • "ADD org.apache.hadoop.chukwa.datacollection.adaptor.ChukwaTestAdaptor chukwaTestAdaptorType 0 114027"
        • "ADD org.apache.hadoop.chukwa.datacollection.adaptor.ChukwaTestAdaptor chukwaTestAdaptorType 114027"
        Show
        Jerome Boulon added a comment - Yes, HADOOP-5087 -2.patch now contains 2 additional test case: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.ChukwaTestAdaptor chukwaTestAdaptorType 0 114027" "ADD org.apache.hadoop.chukwa.datacollection.adaptor.ChukwaTestAdaptor chukwaTestAdaptorType 114027"
        Hide
        Ari Rabkin added a comment -

        Awesome. +1

        Show
        Ari Rabkin added a comment - Awesome. +1
        Hide
        Jerome Boulon added a comment -

        Junit failed because of HADOOP-5138

        Show
        Jerome Boulon added a comment - Junit failed because of HADOOP-5138
        Hide
        Ari Rabkin added a comment -

        I think this is the right one – I fiddled for a while and checked a bunch of the test cases.

        Show
        Ari Rabkin added a comment - I think this is the right one – I fiddled for a while and checked a bunch of the test cases.
        Hide
        Ari Rabkin added a comment -

        This patch removes trailing spaces from adaptor parameters; I'm pretty sure this is the Right Thing.

        Show
        Ari Rabkin added a comment - This patch removes trailing spaces from adaptor parameters; I'm pretty sure this is the Right Thing.
        Hide
        Jerome Boulon added a comment -

        The definition for the ADD command is:

        // words should contain (space delimited):
        // 0) command ("add")
        // 1) AdaptorClassname
        // 2) dataType (e.g. "hadoop_log")
        // 3) params <optional>
        // (e.g. for files, this is filename,
        // but can be arbitrarily many space
        // delimited agent specific params )
        // 4) offset

        How can you remove trailing spaces from adaptor parameters, this is adator specific and the adaptor should take care of that and this should not be automatically by the processCommand: HADOOP-5087-2.patch is doing that

        Current tests cases are failing for 2 reasons:
        -> space on the filename and the adaptor should be fixed
        -> A test case send some chunks to the queue but do not clean up after itself and the shutdown method on the agent is not doing any sort of cleanup since in the real world the agent is calling System.exit(0). The solution is to move that test in a separate test case. Since we are forking, it will be fine.

        Show
        Jerome Boulon added a comment - The definition for the ADD command is: // words should contain (space delimited): // 0) command ("add") // 1) AdaptorClassname // 2) dataType (e.g. "hadoop_log") // 3) params <optional> // (e.g. for files, this is filename, // but can be arbitrarily many space // delimited agent specific params ) // 4) offset How can you remove trailing spaces from adaptor parameters, this is adator specific and the adaptor should take care of that and this should not be automatically by the processCommand: HADOOP-5087 -2.patch is doing that Current tests cases are failing for 2 reasons: -> space on the filename and the adaptor should be fixed -> A test case send some chunks to the queue but do not clean up after itself and the shutdown method on the agent is not doing any sort of cleanup since in the real world the agent is calling System.exit(0). The solution is to move that test in a separate test case. Since we are forking, it will be fine.
        Hide
        Ari Rabkin added a comment -

        Currently, no adaptors assume that their parameters can end with spaces. So we can change that part of the spec without breaking things. And I think it's generally more confusing than useful; if an adaptor needs parameters to end with spaces, they can quote their parameters.

        Show
        Ari Rabkin added a comment - Currently, no adaptors assume that their parameters can end with spaces. So we can change that part of the spec without breaking things. And I think it's generally more confusing than useful; if an adaptor needs parameters to end with spaces, they can quote their parameters.
        Hide
        Jerome Boulon added a comment -

        The idea of HADOOP-4947 was to have a more flexible parsing for chukwa commands.
        Moving to regex was a good idea but the current regex to match the previous parsing (6-7 simple statements) seems to be very complicated and will be difficult to extend in the future.

        So, I'm asking if in order to keep it simple, shouldn't we revert back to something similar to the initial parsing?

        Show
        Jerome Boulon added a comment - The idea of HADOOP-4947 was to have a more flexible parsing for chukwa commands. Moving to regex was a good idea but the current regex to match the previous parsing (6-7 simple statements) seems to be very complicated and will be difficult to extend in the future. So, I'm asking if in order to keep it simple, shouldn't we revert back to something similar to the initial parsing?
        Hide
        Ari Rabkin added a comment -

        I wouldn't revert. The previous code was very complex and difficult to extend, too. And had a number of quirks, or bugs, depending on what you think the proper behavior was. I think this is actually simpler. I vote keep.

        Show
        Ari Rabkin added a comment - I wouldn't revert. The previous code was very complex and difficult to extend, too. And had a number of quirks, or bugs, depending on what you think the proper behavior was. I think this is actually simpler. I vote keep.
        Hide
        Mac Yang added a comment -

        Regex is very powerful and could provide an elegant solution to the right problem. However, it's not the easiest thing to read and maintain.

        A typical answer to regex maintainability issue is to have detailed comment on the regex. O'Reilly has an article on how to maintain regex which I thought was quite useful (http://www.perl.com/pub/a/2004/01/16/regexps.html). I think we should do something like that if we want to take the regex approach.

        Show
        Mac Yang added a comment - Regex is very powerful and could provide an elegant solution to the right problem. However, it's not the easiest thing to read and maintain. A typical answer to regex maintainability issue is to have detailed comment on the regex. O'Reilly has an article on how to maintain regex which I thought was quite useful ( http://www.perl.com/pub/a/2004/01/16/regexps.html ). I think we should do something like that if we want to take the regex approach.
        Hide
        Ari Rabkin added a comment -

        Comments are good. It should be easy to split the regex into pieces with comments and I'm happy to do it. But we should decide exactly what the behavior we want is, in the case where you have multiple spaces between an Adaptor's parameters and the starting offset. Which spaces belong to the parameter, and which are discarded?
        That is, suppose have something that looks like:
        add ...FileTailingAdaptor... foo 10
        Is the filename "foo" or "foo " or?

        This is basically a matter of taste. I vote for the former; I think Jerome preferrs the latter. Other opinions?

        Show
        Ari Rabkin added a comment - Comments are good. It should be easy to split the regex into pieces with comments and I'm happy to do it. But we should decide exactly what the behavior we want is, in the case where you have multiple spaces between an Adaptor's parameters and the starting offset. Which spaces belong to the parameter, and which are discarded? That is, suppose have something that looks like: add ...FileTailingAdaptor... foo 10 Is the filename "foo" or "foo " or? This is basically a matter of taste. I vote for the former; I think Jerome preferrs the latter. Other opinions?
        Hide
        Eric Yang added a comment -

        +1 on "foo".

        Show
        Eric Yang added a comment - +1 on "foo".
        Hide
        Mac Yang added a comment -

        another +1 for "foo"

        Show
        Mac Yang added a comment - another +1 for "foo"
        Hide
        Ari Rabkin added a comment -

        You asked for it, you got it. Now with comments, and reluctant matching.

        Show
        Ari Rabkin added a comment - You asked for it, you got it. Now with comments, and reluctant matching.
        Hide
        Ari Rabkin added a comment -

        This has been languishing for a while and I'd like to resolve it; I don't want to commit anything without a little more consensus, though.

        Thoughts on the most recent patch? [Is there a way to trigger hudson to re-check a patch?]

        Show
        Ari Rabkin added a comment - This has been languishing for a while and I'd like to resolve it; I don't want to commit anything without a little more consensus, though. Thoughts on the most recent patch? [Is there a way to trigger hudson to re-check a patch?]
        Hide
        Ari Rabkin added a comment -

        FTA doesn't parse its argument correctly.

        Show
        Ari Rabkin added a comment - FTA doesn't parse its argument correctly.
        Hide
        Ari Rabkin added a comment -

        patch should only affect file tailing adaptor

        Show
        Ari Rabkin added a comment - patch should only affect file tailing adaptor
        Hide
        Ari Rabkin added a comment -

        existing tests cover this case, none added

        Show
        Ari Rabkin added a comment - existing tests cover this case, none added
        Hide
        Eric Yang added a comment -

        +1 on fixftaregex.patch

        Show
        Eric Yang added a comment - +1 on fixftaregex.patch
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12401259/fixftaregex.patch
        against trunk revision 749318.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/39/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/39/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/39/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/39/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12401259/fixftaregex.patch against trunk revision 749318. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/39/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/39/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/39/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/39/console This message is automatically generated.
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #778 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/778/ )

          People

          • Assignee:
            Ari Rabkin
            Reporter:
            Jerome Boulon
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development