Hadoop Common
  1. Hadoop Common
  2. HADOOP-7947

Validate XMLs if a relevant tool is available, when using scripts

    Details

    • Type: Wish Wish
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.7.0
    • Fix Version/s: None
    • Component/s: scripts
    • Labels:

      Description

      Given that we are locked down to using only XML for configuration and most of the administrators need to manage it by themselves (unless a tool that manages for you is used), it would be good to also validate the provided config XML (*-site.xml) files with a tool like xmllint or maybe Xerces somehow, when running a command or (at least) when starting up daemons.

      We should use this only if a relevant tool is available, and optionally be silent if the env. requests.

      1. HADOOP-7947.001.patch
        5 kB
        Kengo Seki
      2. HADOOP-7947.002.patch
        20 kB
        Kengo Seki
      3. HADOOP-7947.003.patch
        21 kB
        Kengo Seki
      4. HADOOP-7947.004.patch
        20 kB
        Kengo Seki

        Activity

        Hide
        Allen Wittenauer added a comment -

        It'd be trivial to add a sub-command that does the analysis rather than doing it on very run.

        hadoop conftest (optional list of files, otherwise default to HADOOP_CONF_DIR/*xml)

        Show
        Allen Wittenauer added a comment - It'd be trivial to add a sub-command that does the analysis rather than doing it on very run. hadoop conftest (optional list of files, otherwise default to HADOOP_CONF_DIR/*xml)
        Hide
        Kengo Seki added a comment -

        I tried to implement Allen Wittenauer’s comment.
        Test cases will be attached later.

        If no argument is given, all XML files in $HADOOP_CONF_DIR are verified.

        [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ export HADOOP_PREFIX=/Users/sekikn/hadoop-3.0.0-SNAPSHOT
        [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ bin/hadoop conftest
        /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/capacity-scheduler.xml: valid
        /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml: valid
        /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/hadoop-policy.xml: valid
        /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/hdfs-site.xml: valid
        /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/httpfs-site.xml: valid
        /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/kms-acls.xml: valid
        /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/kms-site.xml: valid
        /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/yarn-site.xml: valid
        OK
        [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ echo $?
        0
        

        If some arguments are given, those files are verified.

        [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ bin/hadoop conftest etc/hadoop/log4j.properties 
        [Fatal Error] log4j.properties:1:1: Content is not allowed in prolog.
        /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/log4j.properties: INVALID
        Invalid file exists
        [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ echo $?
        1
        

        I’m glad if someone reviews this patch.

        Show
        Kengo Seki added a comment - I tried to implement Allen Wittenauer ’s comment. Test cases will be attached later. If no argument is given, all XML files in $HADOOP_CONF_DIR are verified. [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ export HADOOP_PREFIX=/Users/sekikn/hadoop-3.0.0-SNAPSHOT [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ bin/hadoop conftest /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/capacity-scheduler.xml: valid /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml: valid /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/hadoop-policy.xml: valid /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/hdfs-site.xml: valid /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/httpfs-site.xml: valid /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/kms-acls.xml: valid /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/kms-site.xml: valid /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/yarn-site.xml: valid OK [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ echo $? 0 If some arguments are given, those files are verified. [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ bin/hadoop conftest etc/hadoop/log4j.properties [Fatal Error] log4j.properties:1:1: Content is not allowed in prolog. /Users/sekikn/hadoop-3.0.0-SNAPSHOT/etc/hadoop/log4j.properties: INVALID Invalid file exists [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ echo $? 1 I’m glad if someone reviews this patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12691536/HADOOP-7947.001.patch
        against trunk revision ef3c3a8.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

        org.apache.hadoop.ha.TestZKFailoverControllerStress

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5389//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5389//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12691536/HADOOP-7947.001.patch against trunk revision ef3c3a8. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverControllerStress Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5389//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5389//console This message is automatically generated.
        Hide
        Allen Wittenauer added a comment -

        Cool! But a few things:

        • It feels 'off' to me to actually read $ {HADOOP_CONF_DIR}

          from the shell environment. But given that Hadoop reads the xml files from the classpath, I suppose this is really the only alternative. We should probably look at the other parts of Hadoop that do this (read HADOOP_CONF_DIR... I know YARN does) and make sure we're doing something similar. (e.g., is there a global constant of HADOOP_CONF_DIR somewhere? Should there be?)

        • We should probably support -conf here from the generic options parser bits. This would make it trivial to support a "rolling" configuration dir-type process, without having to muck with the shell environment while also being consistent with the rest of Hadoop.
        • If it's actually invalid XML (rather than not XML at all), does it show you which property was broken?
        Show
        Allen Wittenauer added a comment - Cool! But a few things: It feels 'off' to me to actually read $ {HADOOP_CONF_DIR} from the shell environment. But given that Hadoop reads the xml files from the classpath, I suppose this is really the only alternative. We should probably look at the other parts of Hadoop that do this (read HADOOP_CONF_DIR... I know YARN does) and make sure we're doing something similar. (e.g., is there a global constant of HADOOP_CONF_DIR somewhere? Should there be?) We should probably support -conf here from the generic options parser bits. This would make it trivial to support a "rolling" configuration dir-type process, without having to muck with the shell environment while also being consistent with the rest of Hadoop. If it's actually invalid XML (rather than not XML at all), does it show you which property was broken?
        Hide
        Kengo Seki added a comment -

        Allen Wittenauer, thank you for your comment! And sorry for the late reply.

        • In addition to YARN, httpfs also gets HADOOP_CONF_DIR from the shell environments (via system properties).
          So, I think this is not a special case. Other environment variables, like HADOOP_HOME, HADOOP_JAAS_DEBUG, HADOOP_PREFIX, are also referred from multiple classes.
          I couldn’t find the global (not only for YARN, shared by multiple projects) constant of HADOOP_CONF_DIR.
        • Thank you for your advice. I’ll refer to GenericOptionsParser.
        • Currently, it shows error only when XML is not well-formed. I’ll add a feature to show where is broken.
        Show
        Kengo Seki added a comment - Allen Wittenauer , thank you for your comment! And sorry for the late reply. In addition to YARN, httpfs also gets HADOOP_CONF_DIR from the shell environments (via system properties). So, I think this is not a special case. Other environment variables, like HADOOP_HOME, HADOOP_JAAS_DEBUG, HADOOP_PREFIX, are also referred from multiple classes. I couldn’t find the global (not only for YARN, shared by multiple projects) constant of HADOOP_CONF_DIR. Thank you for your advice. I’ll refer to GenericOptionsParser. Currently, it shows error only when XML is not well-formed. I’ll add a feature to show where is broken.
        Hide
        Allen Wittenauer added a comment -

        Cancelling patch due to comments.

        Show
        Allen Wittenauer added a comment - Cancelling patch due to comments.
        Hide
        Kengo Seki added a comment -

        Attaching revised patch.

        At first patch, I reused o.a.h.conf.Configuration’s logic for validation, but DOM doesn’t keep element’s position.
        So I tried SAX and XML schema for validation, but it was too strict for this purpose (e.g. sub-element of <configuration> can be anything, but we’d like to detect if it’s not a <property>).
        Finally, I used StAX and wrote the validation logic from scratch.

        This function detects the following cases:

        • Not well-formed XML
        • XML is well-formed, but
          • top-level element is not <configuration>
          • sub-element of <configuration> is not <property>
          • same attributes and/or elements are duplicately defined in <property>
          • <name> or <value> does not exist in <property>
          • <name> is empty (empty <value> can be valid in some properties, so it isn’t detected)
          • duplicated <property>s with the same <name> value exist

        Execution examples:

        [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ env HADOOP_PREFIX=. bin/hadoop conftest
        /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/capacity-scheduler.xml: valid
        /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/core-site.xml: valid
        /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/hadoop-policy.xml: valid
        /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/hdfs-site.xml: valid
        /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/httpfs-site.xml: valid
        /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/kms-acls.xml: valid
        /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/kms-site.xml: valid
        /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/yarn-site.xml: valid
        OK
        [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ echo $?
        0
        [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ cat ~/bad-manners.xml 
        <?xml version="1.0" encoding="UTF-8"?>
        <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
        <configuration>
        
          <non-property/>
        
          <property/>
        
          <property>
            <name/>
            <value/>
          </property>
        
          <property>
            <name>a</name>
            <value>b</value>
          </property>
        
          <property name="c">
            <name>c</name>
            <value>d</value>
          </property>
        
          <property>
            <name>e</name>
            <value>f</value>
          </property>
        
          <property>
            <name>e</name>
            <value>g</value>
          </property>
        
        </configuration>
        [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ env HADOOP_PREFIX=. bin/hadoop conftest -conf etc/hadoop/core-site.xml -conf ~/bad-manners.xml 
        /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml: valid
        /Users/sekikn/bad-manners.xml:
        	Line 5: element not <property>
        	Line 7: <property> has no <name>
        	Line 7: <property> has no <value>
        	Line 9: <property> has an empty <name>
        	Line 19: <property> has duplicated <name>s
        	Line 24, 29: duplicated <property>s for e
        Invalid file exists
        [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ echo $?
        1
        
        Show
        Kengo Seki added a comment - Attaching revised patch. At first patch, I reused o.a.h.conf.Configuration’s logic for validation, but DOM doesn’t keep element’s position. So I tried SAX and XML schema for validation, but it was too strict for this purpose (e.g. sub-element of <configuration> can be anything, but we’d like to detect if it’s not a <property>). Finally, I used StAX and wrote the validation logic from scratch. This function detects the following cases: Not well-formed XML XML is well-formed, but top-level element is not <configuration> sub-element of <configuration> is not <property> same attributes and/or elements are duplicately defined in <property> <name> or <value> does not exist in <property> <name> is empty (empty <value> can be valid in some properties, so it isn’t detected) duplicated <property>s with the same <name> value exist Execution examples: [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ env HADOOP_PREFIX=. bin/hadoop conftest /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/capacity-scheduler.xml: valid /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/core-site.xml: valid /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/hadoop-policy.xml: valid /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/hdfs-site.xml: valid /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/httpfs-site.xml: valid /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/kms-acls.xml: valid /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/kms-site.xml: valid /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/yarn-site.xml: valid OK [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ echo $? 0 [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ cat ~/bad-manners.xml <?xml version= "1.0" encoding= "UTF-8" ?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl" ?> <configuration> <non-property/> <property/> <property> <name/> <value/> </property> <property> <name>a</name> <value>b</value> </property> <property name= "c" > <name>c</name> <value>d</value> </property> <property> <name>e</name> <value>f</value> </property> <property> <name>e</name> <value>g</value> </property> </configuration> [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ env HADOOP_PREFIX=. bin/hadoop conftest -conf etc/hadoop/core-site.xml -conf ~/bad-manners.xml /Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml: valid /Users/sekikn/bad-manners.xml: Line 5: element not <property> Line 7: <property> has no <name> Line 7: <property> has no <value> Line 9: <property> has an empty <name> Line 19: <property> has duplicated <name>s Line 24, 29: duplicated <property>s for e Invalid file exists [sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ echo $? 1
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12702004/HADOOP-7947.002.patch
        against trunk revision 5d0bae5.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

        -1 release audit. The applied patch generated 1 release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5819//testReport/
        Release audit warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5819//artifact/patchprocess/patchReleaseAuditProblems.txt
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5819//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5819//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702004/HADOOP-7947.002.patch against trunk revision 5d0bae5. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. -1 release audit . The applied patch generated 1 release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5819//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5819//artifact/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5819//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5819//console This message is automatically generated.
        Hide
        Kengo Seki added a comment -

        Fix findbugs and release audit warnings.

        Show
        Kengo Seki added a comment - Fix findbugs and release audit warnings.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12702268/HADOOP-7947.003.patch
        against trunk revision e17e5ba.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5836//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5836//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702268/HADOOP-7947.003.patch against trunk revision e17e5ba. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5836//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5836//console This message is automatically generated.
        Hide
        Allen Wittenauer added a comment -

        Playing around with this, it mostly worked as expected. It's very cool to have the feature!

        I do have a few nits though:

        $ bin/hadoop conftest --help
        OK
        $ bin/hadoop conftest -h
        OK
        $ bin/hadoop conftest -help
        OK
        

        Help, how do I get help?

        $ bin/hadoop conftest -conf ~/HADOOP/conf2/
        /Users/aw/HADOOP/conf2 is not a file
        

        Oh, so -conf doesn't work like the rest of Hadoop. I really want to pass it a complete directory so that I can do things like:

        hadoop conftest --conf newconf
        if [[ $? =0 ]]; then
          sbin/stop-dfs.sh
          mv conf oldconf.$$
          mv newconf conf
          sbin/start-dfs.sh
        else
          echo "Yo, your config don't work!"
          exit 1
        fi
        

        I really feel as though it should probably just call GenericOptionsParser straight up rather than emulate it. This way it will also pick up any other features that it has or gets added in the future.

        Awesome work so far though! Really looking forward to this one!

        Show
        Allen Wittenauer added a comment - Playing around with this, it mostly worked as expected. It's very cool to have the feature! I do have a few nits though: $ bin/hadoop conftest --help OK $ bin/hadoop conftest -h OK $ bin/hadoop conftest -help OK Help, how do I get help? $ bin/hadoop conftest -conf ~/HADOOP/conf2/ /Users/aw/HADOOP/conf2 is not a file Oh, so -conf doesn't work like the rest of Hadoop. I really want to pass it a complete directory so that I can do things like: hadoop conftest --conf newconf if [[ $? =0 ]]; then sbin/stop-dfs.sh mv conf oldconf.$$ mv newconf conf sbin/start-dfs.sh else echo "Yo, your config don't work!" exit 1 fi I really feel as though it should probably just call GenericOptionsParser straight up rather than emulate it. This way it will also pick up any other features that it has or gets added in the future. Awesome work so far though! Really looking forward to this one!
        Hide
        Kengo Seki added a comment -

        Thank you for your trial and comment, Allen Wittenauer!
        Help messages are always needed, of course.

        Let me discuss whether we should use GenericOptionsParser directly or not.
        If we use GenericOptionsParser directly, and when we specify a directory as the argument of -conf option, instantiating GenericOptionsParser will fail because GenericOptionsParser takes the argument for a path to file and tries to load it.

        I’d like to support directory argument for -conf option because your use case is very reasonable, so I think it is better not to use GenericOptionsParser directly.
        What do you think?

        Show
        Kengo Seki added a comment - Thank you for your trial and comment, Allen Wittenauer ! Help messages are always needed, of course. Let me discuss whether we should use GenericOptionsParser directly or not. If we use GenericOptionsParser directly, and when we specify a directory as the argument of -conf option, instantiating GenericOptionsParser will fail because GenericOptionsParser takes the argument for a path to file and tries to load it. I’d like to support directory argument for -conf option because your use case is very reasonable, so I think it is better not to use GenericOptionsParser directly. What do you think?
        Hide
        Kengo Seki added a comment -

        I have another idea. We add a new constructor like 'public GenericOptionsParser(String[] args, boolean flag)' so as to utilize GenericOptionsParser, and make it avoid calling processGeneralOptions() in parseGeneralOptions() if the second argument is false. But I'm not sure this idea is good or bad...

        Show
        Kengo Seki added a comment - I have another idea. We add a new constructor like 'public GenericOptionsParser(String[] args, boolean flag)' so as to utilize GenericOptionsParser, and make it avoid calling processGeneralOptions() in parseGeneralOptions() if the second argument is false. But I'm not sure this idea is good or bad...
        Hide
        Allen Wittenauer added a comment -

        I'd definitely want a second, third, ... opinion on that as well. Hopefully the other folks watching will chime in...

        I'm inclined to say yes, that's probably the better direction but that's clearly a different JIRA.

        Show
        Allen Wittenauer added a comment - I'd definitely want a second, third, ... opinion on that as well. Hopefully the other folks watching will chime in... I'm inclined to say yes, that's probably the better direction but that's clearly a different JIRA.
        Hide
        Kengo Seki added a comment -

        I agree with Allen Wittenauer and would like to:

        • create a new JIRA to add a new constructor to GenericOptionsParser for the purpose of only parsing the command line (not modifying configuration object)
        • resume this JIRA after above finished

        Any advice?

        Show
        Kengo Seki added a comment - I agree with Allen Wittenauer and would like to: create a new JIRA to add a new constructor to GenericOptionsParser for the purpose of only parsing the command line (not modifying configuration object) resume this JIRA after above finished Any advice?
        Hide
        Tsuyoshi Ozawa added a comment -

        Kengo Seki Allen Wittenauer thank you for taking this interesting work! IMHO, we shouldn't all work here. It means we should add "conftest" command partially here.

        1. For now, I don't think we must support conftest --conf at first. It would be useful to add only "bin/hadoop conftest" here.
        2. Creating new JIRA and let's discuss and brush up conftest command there as Allen mentioned - supporting --conf option there.

        I think this is straightforward way. What do you think?

        Show
        Tsuyoshi Ozawa added a comment - Kengo Seki Allen Wittenauer thank you for taking this interesting work! IMHO, we shouldn't all work here. It means we should add "conftest" command partially here. 1. For now, I don't think we must support conftest --conf at first. It would be useful to add only "bin/hadoop conftest" here. 2. Creating new JIRA and let's discuss and brush up conftest command there as Allen mentioned - supporting --conf option there. I think this is straightforward way. What do you think?
        Hide
        Allen Wittenauer added a comment -

        The more I think about this, the more I think we may be over thinking it. I need to verify, but I suspect we already have entire directory checking:

        $ hadoop --conf dir conftest
        

        should check the whole directory, right? Perhaps what really needs to happen is:

        $ hadoop conftest -conffile file
        

        ... in other words, a rename of the option. This means we could commit this as-is and fix -conf to be a directory at a later date.

        Show
        Allen Wittenauer added a comment - The more I think about this, the more I think we may be over thinking it. I need to verify, but I suspect we already have entire directory checking: $ hadoop --conf dir conftest should check the whole directory, right? Perhaps what really needs to happen is: $ hadoop conftest -conffile file ... in other words, a rename of the option. This means we could commit this as-is and fix -conf to be a directory at a later date.
        Hide
        Tsuyoshi Ozawa added a comment -

        It sounds good to me.

        Show
        Tsuyoshi Ozawa added a comment - It sounds good to me.
        Hide
        Kengo Seki added a comment -

        Allen Wittenauer Tsuyoshi Ozawa, thank you for the comments!

        ... in other words, a rename of the option.

        I overlooked. A simple and nice solution!
        I attach a new patch to change the option’s name, enable to specify directories, and add help messages.
        For now, it parses the options by itself, but when -conf will be fixed in the future, it’ll be replaced with GenericOptionsParser entirely.

        Show
        Kengo Seki added a comment - Allen Wittenauer Tsuyoshi Ozawa , thank you for the comments! ... in other words, a rename of the option. I overlooked. A simple and nice solution! I attach a new patch to change the option’s name, enable to specify directories, and add help messages. For now, it parses the options by itself, but when -conf will be fixed in the future, it’ll be replaced with GenericOptionsParser entirely.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12704934/HADOOP-7947.004.patch
        against trunk revision 5608520.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5955//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5955//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704934/HADOOP-7947.004.patch against trunk revision 5608520. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5955//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5955//console This message is automatically generated.

          People

          • Assignee:
            Kengo Seki
            Reporter:
            Harsh J
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:

              Development