Issue Details (XML | Word | Printable)

Key: HADOOP-4944
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Rama, Ramasamy
Reporter: Rama, Ramasamy
Votes: 0
Watchers: 6
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Allow Xinclude in hadoop config file

Created: 24/Dec/08 11:28 PM   Updated: 23/Apr/09 07:18 PM
Return to search
Component/s: conf
Affects Version/s: None
Fix Version/s: 0.20.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works configInclude2.patch 2009-01-15 06:52 PM dhruba borthakur 3 kB
Text File Licensed for inclusion in ASF works includes_in_conf.diff.txt 2008-12-24 11:58 PM Rama, Ramasamy 0.7 kB
Issue Links:
Reference
 

Hadoop Flags: Reviewed
Resolution Date: 16/Apr/09 05:05 AM


 Description  « Hide
It would be easier to mange the configuration of hadoop by allowing include files in configuration file (file: hadoop-site.xml)

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Rama, Ramasamy added a comment - 24/Dec/08 11:40 PM
Allow include files in conf. For example in hadoop-site.xml


<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
...

<xi:include href="test.xml"/>

...
</configuration>


Rama, Ramasamy added a comment - 24/Dec/08 11:58 PM
By setting XcludeAware to true, we can have xi:include tags in xml file.
XincludeAware requires setNamespaceAware to be true.

dhruba borthakur added a comment - 25/Dec/08 12:20 AM
+1. I think this is a compatible change because it should not affect parsing of existing configuration files.

Doug Cutting added a comment - 26/Dec/08 05:42 PM
+1 This looks fine to me.

Steve Loughran added a comment - 05/Jan/09 01:45 PM
+1, though there are docs and tests to consider. The other issue is security; it now becomes possible to xinclude from somewhere else, if you aren't careful.

dhruba borthakur added a comment - 08/Jan/09 12:36 AM
Rama: if you have a unit test for this one, that will be great.

However, i am willing to commit this without a unit test because the code-change is very very minor. Also, in some sense it is an "Improvement" only. so the risk of impacting existing installations is minimal.


dhruba borthakur added a comment - 09/Jan/09 05:20 AM
Rama: let's discuss how we can come up with a simple unit test.

Steve Loughran added a comment - 09/Jan/09 04:48 PM
Once the patch becomes a feature, you need the test to make sure it doesn't go away later; for regression testing.

dhruba borthakur added a comment - 15/Jan/09 06:52 PM
This patch allows including a xml file via the <xi:include> xml tag. I updated Rama's version of the patch to include a unit test as well.

dhruba borthakur added a comment - 23/Jan/09 12:38 AM
Can some kind soul please review this patch? Thanks.

Doug Cutting added a comment - 23/Jan/09 07:31 PM
The current version of this patch changes the syntax of configurations so that they may be nested. Couldn't this extension of the syntax be avoided by using "range-inside", e.g., something like:
<include xmlns="http://www.w3.org/2001/XInclude"
 href="foo.xml"
 xpointer="xpointer(range-inside(//configuration))"/>

Rama, Ramasamy added a comment - 23/Jan/09 08:51 PM
That was the original idea, but xerces currently doesn’t support xpointer() scheme (http://xerces.apache.org/xerces2-j/faq-xinclude.html#faq-8) and got the following error:

hadoop5224.snc1.facebook.com: [Warning] hadoop-site.xml:9:147: SchemeUnsupported: The XPointer scheme 'xpointer' is not supported.
..
2009-01-23 12:28:49,678 FATAL org.apache.hadoop.conf.Configuration: error parsing conf file: org.xml.sax.SAXParseException: An 'include' failed, and no 'fallback' element was found.
2009-01-23 12:28:49,679 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.lang.RuntimeException: org.xml.sax.SAXParseException: An 'include' failed, and no 'fallback' element was found.
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:912)


Doug Cutting added a comment - 23/Jan/09 09:16 PM
> xerces currently doesn't support xpointer() scheme

Sigh. So I wonder if, instead of supporting nested "configuration" tags we might instead support nested "properties" tags, that each contain a set of properties. Or do you think it would be useful to have included files also capable of acting as top-level configuration files?


dhruba borthakur added a comment - 24/Jan/09 10:40 PM
I think it is more powerful for an include file A to be able to include another include file B for a cluster, while at the same time B to be the top level config file for another cluster. I would like to keep it this way, unless you feel strongly otherwise.

Steve Loughran added a comment - 26/Jan/09 04:57 PM
As someone who works in CM, I clearly think that being able to include stuff is a handy way to deal with scale. But if you treat every .xml file as toplevel, dont forget to include test cases that
  • have an import of yourself
  • import file A.xml that imports B.xml that imports A.xml
  • include inconsistent (name,value) declarations
  • import file A.xml that imports B.xml that then imports a nonexistent file (try and have the error message helpfile here)
  • tries to do property expansion to set the path of the include

You dont have to support these, property expansion in particular. But get those tests together to make sure that whatever behaviour you have matches expectations.


dhruba borthakur added a comment - 28/Jan/09 06:58 AM
@Steve: Good points. Thaks. I verified that all of the issues you mentioned is actually handled by the Xerces library. It can detect recursive includes. The hadoop Configuration.get() method generates an RuntimeException when this occurs.

I plan to commit this patch to 0.20 and trunk.


dhruba borthakur added a comment - 30/Jan/09 08:47 PM
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 3 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

dhruba borthakur added a comment - 30/Jan/09 09:56 PM
I just committed this. Thanks Rama!

Steve Loughran added a comment - 13/Feb/09 04:45 PM
re-opening as it appears to raise problems with different XML parsers/JVMs. I'm not sure that 0.20 will want this.

dhruba borthakur added a comment - 13/Feb/09 06:21 PM
Hi Steve, Hadoop uses the xerces library to parse the XML file. Xerces handles include files well. Can you pl explain what problem you are seeing.

Steve Loughran added a comment - 13/Feb/09 09:42 PM
I think myself and someone on the user list has encountered problems which may be caused by the built in JVM's version of Xerces; see HADOOP-5254 for details and stack traces

Hudson added a comment - 16/Feb/09 05:00 PM

dhruba borthakur added a comment - 16/Apr/09 05:05 AM
The problems that you saw were fixed in HADOOP-5254. I am closing this JIRA. Please re-open (or create a new one) if you see any remaining issues with this one.