Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: conf
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      It would be easier to mange the configuration of hadoop by allowing include files in configuration file (file: hadoop-site.xml)

      1. configInclude2.patch
        3 kB
        dhruba borthakur
      2. includes_in_conf.diff.txt
        0.7 kB
        Rama, Ramasamy

        Issue Links

          Activity

          Hide
          Rama, Ramasamy added a comment -

          Allow include files in conf. For example in hadoop-site.xml


          <configuration xmlns:xi="http://www.w3.org/2001/XInclude">
          ...

          <xi:include href="test.xml"/>

          ...
          </configuration>

          Show
          Rama, Ramasamy added a comment - Allow include files in conf. For example in hadoop-site.xml — <configuration xmlns:xi="http://www.w3.org/2001/XInclude"> ... <xi:include href="test.xml"/> ... </configuration>
          Hide
          Rama, Ramasamy added a comment -

          By setting XcludeAware to true, we can have xi:include tags in xml file.
          XincludeAware requires setNamespaceAware to be true.

          Show
          Rama, Ramasamy added a comment - By setting XcludeAware to true, we can have xi:include tags in xml file. XincludeAware requires setNamespaceAware to be true.
          Hide
          dhruba borthakur added a comment -

          +1. I think this is a compatible change because it should not affect parsing of existing configuration files.

          Show
          dhruba borthakur added a comment - +1. I think this is a compatible change because it should not affect parsing of existing configuration files.
          Hide
          Doug Cutting added a comment -

          +1 This looks fine to me.

          Show
          Doug Cutting added a comment - +1 This looks fine to me.
          Hide
          steve_l added a comment -

          +1, though there are docs and tests to consider. The other issue is security; it now becomes possible to xinclude from somewhere else, if you aren't careful.

          Show
          steve_l added a comment - +1, though there are docs and tests to consider. The other issue is security; it now becomes possible to xinclude from somewhere else, if you aren't careful.
          Hide
          dhruba borthakur added a comment -

          Rama: if you have a unit test for this one, that will be great.

          However, i am willing to commit this without a unit test because the code-change is very very minor. Also, in some sense it is an "Improvement" only. so the risk of impacting existing installations is minimal.

          Show
          dhruba borthakur added a comment - Rama: if you have a unit test for this one, that will be great. However, i am willing to commit this without a unit test because the code-change is very very minor. Also, in some sense it is an "Improvement" only. so the risk of impacting existing installations is minimal.
          Hide
          dhruba borthakur added a comment -

          Rama: let's discuss how we can come up with a simple unit test.

          Show
          dhruba borthakur added a comment - Rama: let's discuss how we can come up with a simple unit test.
          Hide
          steve_l added a comment -

          Once the patch becomes a feature, you need the test to make sure it doesn't go away later; for regression testing.

          Show
          steve_l added a comment - Once the patch becomes a feature, you need the test to make sure it doesn't go away later; for regression testing.
          Hide
          dhruba borthakur added a comment -

          This patch allows including a xml file via the <xi:include> xml tag. I updated Rama's version of the patch to include a unit test as well.

          Show
          dhruba borthakur added a comment - This patch allows including a xml file via the <xi:include> xml tag. I updated Rama's version of the patch to include a unit test as well.
          Hide
          dhruba borthakur added a comment -

          Can some kind soul please review this patch? Thanks.

          Show
          dhruba borthakur added a comment - Can some kind soul please review this patch? Thanks.
          Hide
          Doug Cutting added a comment -

          The current version of this patch changes the syntax of configurations so that they may be nested. Couldn't this extension of the syntax be avoided by using "range-inside", e.g., something like:

          <include xmlns="http://www.w3.org/2001/XInclude"
           href="foo.xml"
           xpointer="xpointer(range-inside(//configuration))"/>
          
          Show
          Doug Cutting added a comment - The current version of this patch changes the syntax of configurations so that they may be nested. Couldn't this extension of the syntax be avoided by using "range-inside", e.g., something like: <include xmlns= "http: //www.w3.org/2001/XInclude" href= "foo.xml" xpointer= "xpointer(range-inside( //configuration))" />
          Hide
          Rama, Ramasamy added a comment -

          That was the original idea, but xerces currently doesn’t support xpointer() scheme (http://xerces.apache.org/xerces2-j/faq-xinclude.html#faq-8) and got the following error:

          hadoop5224.snc1.facebook.com: [Warning] hadoop-site.xml:9:147: SchemeUnsupported: The XPointer scheme 'xpointer' is not supported.
          ..
          2009-01-23 12:28:49,678 FATAL org.apache.hadoop.conf.Configuration: error parsing conf file: org.xml.sax.SAXParseException: An 'include' failed, and no 'fallback' element was found.
          2009-01-23 12:28:49,679 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.lang.RuntimeException: org.xml.sax.SAXParseException: An 'include' failed, and no 'fallback' element was found.
          at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:912)

          Show
          Rama, Ramasamy added a comment - That was the original idea, but xerces currently doesn’t support xpointer() scheme ( http://xerces.apache.org/xerces2-j/faq-xinclude.html#faq-8 ) and got the following error: hadoop5224.snc1.facebook.com: [Warning] hadoop-site.xml:9:147: SchemeUnsupported: The XPointer scheme 'xpointer' is not supported. .. 2009-01-23 12:28:49,678 FATAL org.apache.hadoop.conf.Configuration: error parsing conf file: org.xml.sax.SAXParseException: An 'include' failed, and no 'fallback' element was found. 2009-01-23 12:28:49,679 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.lang.RuntimeException: org.xml.sax.SAXParseException: An 'include' failed, and no 'fallback' element was found. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:912)
          Hide
          Doug Cutting added a comment -

          > xerces currently doesn't support xpointer() scheme

          Sigh. So I wonder if, instead of supporting nested "configuration" tags we might instead support nested "properties" tags, that each contain a set of properties. Or do you think it would be useful to have included files also capable of acting as top-level configuration files?

          Show
          Doug Cutting added a comment - > xerces currently doesn't support xpointer() scheme Sigh. So I wonder if, instead of supporting nested "configuration" tags we might instead support nested "properties" tags, that each contain a set of properties. Or do you think it would be useful to have included files also capable of acting as top-level configuration files?
          Hide
          dhruba borthakur added a comment -

          I think it is more powerful for an include file A to be able to include another include file B for a cluster, while at the same time B to be the top level config file for another cluster. I would like to keep it this way, unless you feel strongly otherwise.

          Show
          dhruba borthakur added a comment - I think it is more powerful for an include file A to be able to include another include file B for a cluster, while at the same time B to be the top level config file for another cluster. I would like to keep it this way, unless you feel strongly otherwise.
          Hide
          steve_l added a comment -

          As someone who works in CM, I clearly think that being able to include stuff is a handy way to deal with scale. But if you treat every .xml file as toplevel, dont forget to include test cases that

          • have an import of yourself
          • import file A.xml that imports B.xml that imports A.xml
          • include inconsistent (name,value) declarations
          • import file A.xml that imports B.xml that then imports a nonexistent file (try and have the error message helpfile here)
          • tries to do property expansion to set the path of the include

          You dont have to support these, property expansion in particular. But get those tests together to make sure that whatever behaviour you have matches expectations.

          Show
          steve_l added a comment - As someone who works in CM, I clearly think that being able to include stuff is a handy way to deal with scale. But if you treat every .xml file as toplevel, dont forget to include test cases that have an import of yourself import file A.xml that imports B.xml that imports A.xml include inconsistent (name,value) declarations import file A.xml that imports B.xml that then imports a nonexistent file (try and have the error message helpfile here) tries to do property expansion to set the path of the include You dont have to support these, property expansion in particular. But get those tests together to make sure that whatever behaviour you have matches expectations.
          Hide
          dhruba borthakur added a comment -

          @Steve: Good points. Thaks. I verified that all of the issues you mentioned is actually handled by the Xerces library. It can detect recursive includes. The hadoop Configuration.get() method generates an RuntimeException when this occurs.

          I plan to commit this patch to 0.20 and trunk.

          Show
          dhruba borthakur added a comment - @Steve: Good points. Thaks. I verified that all of the issues you mentioned is actually handled by the Xerces library. It can detect recursive includes. The hadoop Configuration.get() method generates an RuntimeException when this occurs. I plan to commit this patch to 0.20 and trunk.
          Hide
          dhruba borthakur added a comment -

          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Show
          dhruba borthakur added a comment - [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          dhruba borthakur added a comment -

          I just committed this. Thanks Rama!

          Show
          dhruba borthakur added a comment - I just committed this. Thanks Rama!
          Hide
          steve_l added a comment -

          re-opening as it appears to raise problems with different XML parsers/JVMs. I'm not sure that 0.20 will want this.

          Show
          steve_l added a comment - re-opening as it appears to raise problems with different XML parsers/JVMs. I'm not sure that 0.20 will want this.
          Hide
          dhruba borthakur added a comment -

          Hi Steve, Hadoop uses the xerces library to parse the XML file. Xerces handles include files well. Can you pl explain what problem you are seeing.

          Show
          dhruba borthakur added a comment - Hi Steve, Hadoop uses the xerces library to parse the XML file. Xerces handles include files well. Can you pl explain what problem you are seeing.
          Hide
          steve_l added a comment -

          I think myself and someone on the user list has encountered problems which may be caused by the built in JVM's version of Xerces; see HADOOP-5254 for details and stack traces

          Show
          steve_l added a comment - I think myself and someone on the user list has encountered problems which may be caused by the built in JVM's version of Xerces; see HADOOP-5254 for details and stack traces
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #756 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/756/ )
          Hide
          dhruba borthakur added a comment -

          The problems that you saw were fixed in HADOOP-5254. I am closing this JIRA. Please re-open (or create a new one) if you see any remaining issues with this one.

          Show
          dhruba borthakur added a comment - The problems that you saw were fixed in HADOOP-5254 . I am closing this JIRA. Please re-open (or create a new one) if you see any remaining issues with this one.

            People

            • Assignee:
              Rama, Ramasamy
              Reporter:
              Rama, Ramasamy
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development