Issue Details (XML | Word | Printable)

Key: HADOOP-4757
Type: Improvement Improvement
Status: Open Open
Priority: Minor Minor
Assignee: Alex Loddengaard
Reporter: Alex Loddengaard
Votes: 0
Watchers: 8
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Replace log4j.properties with log4j.xml

Created: 03/Dec/08 01:05 AM   Updated: 26/Feb/09 09:04 AM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works log4j_xml_config.patch 2008-12-03 01:10 AM Alex Loddengaard 5 kB
Text File Licensed for inclusion in ASF works log4j_xml_config_v2.patch 2008-12-03 07:22 PM Alex Loddengaard 5 kB


 Description  « Hide
Certain log4j features are not configurable via the log4j.properties file, not to mention the log4j.properties file is messy. For this reason, we should transition to a XML file to configure log4j. The .properties vs. .xml argument has only come up in HADOOP-211, where the XML configuration was vetoed for Tomcat integration issues.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Alex Loddengaard added a comment - 03/Dec/08 01:10 AM
Attaching a patch that removes conf/log4j.properties and adds conf/log4j.xml, which is configuration-equivalent to the original log4j.properties file. I have used this XML file on a single-node cluster; the integration seems to be seamless.

Alex Loddengaard made changes - 03/Dec/08 01:10 AM
Field Original Value New Value
Attachment log4j_xml_config.patch [ 12395151 ]
Alex Loddengaard added a comment - 03/Dec/08 01:10 AM
Patch available.

Alex Loddengaard made changes - 03/Dec/08 01:10 AM
Status Open [ 1 ] Patch Available [ 10002 ]
Doug Cutting added a comment - 03/Dec/08 05:19 PM
If both a .properties and a .xml file exist, which has precedence? If the .properties file has precedence, then this is a back-compatible change. Folks that are using a modified log4j.properties file would continue to use their modified version. But if the .xml file has precedence then this would not be back-compatible, right?

Also, should we make this a .template file? (Anything in conf/ ending in .template is copied to its name minus the .template by the build and the .template version is not included in releases. This is because we expect folks to edit these files and do not want to commit their changes to subversion or include them in patches, so only the .template version exists in subversion.)


Alex Loddengaard added a comment - 03/Dec/08 07:22 PM
Unfortunately the XML file takes precedence over the .properties file, making this patch not backwards compatible. However, I am attaching a new patch that adds a log4j.xml.template file instead of a log4j.xml file.

Given that this patch is not backwards compatible, its inclusion may not ever happen. However, at least this issue can be referenced when users need to use the XML file instead of the .properties file.

Thanks for the comments, Doug!


Alex Loddengaard made changes - 03/Dec/08 07:22 PM
Attachment log4j_xml_config_v2.patch [ 12395204 ]
Tsz Wo (Nicholas), SZE added a comment - 03/Dec/08 07:35 PM
> Given that this patch is not backwards compatible, its inclusion may not ever happen.

Given the recent proposal of Hadoop 1.0 and issues like HADOOP-4631, HADOOP-4044, etc., being backward incompatible may be okay. It is more important to have a good design. I guess this one may happens after 0.20.


Steve Loughran added a comment - 04/Dec/08 10:43 AM
I would personally argue against having any log4j config file in a JAR, as trying to track down the origin of broken logging settings in a big application is painful. Indeed, it is the origin of the <whichresource> task in Ant

<whichresource resource="/log4j.properties" property="log4j.url" >

A key issue here is that when hadoop-core is used as a client library, that client may pick up the hadoop log policy not its own.

Even in a cluster, Logging settings are very installation specific; embedding them in a JAR means you need to edit the JAR to replace them. If all Log4J settings were omitted and yet could be picked up from a conf directory, all would be well.

Maybe the solution here is to permit site-conf jars that contain only the configuration files; you could switch to a different configuration JAR without rebuilding/redistributing hadoop-core itself.


Alex Loddengaard added a comment - 04/Dec/08 05:37 PM
You bring up a great point, Steve. My main concern is that I don't think we want to make installing and configuring Hadoop any harder than it is now. Your solution of having a conf JAR wouldn't make installing and configuring any more difficult, and it would also allow "advanced" users to modify, include, or not include this JAR.

Hadoop QA added a comment - 08/Dec/08 08:54 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12395204/log4j_xml_config_v2.patch
against trunk revision 724229.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3685/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3685/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3685/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3685/console

This message is automatically generated.


Jakob Homan added a comment - 09/Dec/08 12:14 AM
Alex-
I've looked at the patch and have a few questions.
  • For the DRFA and TLA appenders, shouldn't it be ConversionPattern rather than ConversationPattern?
  • There are several appenders included in the default log4j.properites that are not enabled by default (c.f. RFA), which are not included in the log4j.xml. To minimize issues with users who may be enabling and using them, should these be converted as well?
  • Does the xml format support value substitution to override defaults? For example, how would "./hadoop.log" in the file param on the DRFA be substituted? Currently it's handled by substitution, but I'm not sure how this would work with the xml file. Is it possible to define these default values in the xml?
  • There are a few params from the original file that don't look like they made it into the new one, such as the tasklog noKeepSplits, purgeLogSplits and logsRetainsHours. Should these be included as well?
  • Does the build.xml file need to be modified to remove the reference to the log4j.properties and a new one to log4j.xml be inserted?

Jakob Homan made changes - 09/Dec/08 12:15 AM
Status Patch Available [ 10002 ] Open [ 1 ]
Alex Loddengaard added a comment - 09/Dec/08 06:15 PM
These are great questions, Jakob. Let me answer what I can now. The more difficult questions will require some research on my part:

For the DRFA and TLA appenders, shouldn't it be ConversionPattern rather than ConversationPattern?

Yes, this was a typo on my behalf. Good catch!

There are several appenders included in the default log4j.properites that are not enabled by default (c.f. RFA), which are not included in the log4j.xml. To minimize issues with users who may be enabling and using them, should these be converted as well?

These appenders are commented out in the .properties file, which made me think they were deprecated. If people are using them, then I totally agree that they should be included in the XML file. Another good catch.

Does the build.xml file need to be modified to remove the reference to the log4j.properties and a new one to log4j.xml be inserted?

From the testing I did, no. As long as the log4j.properties file is not in the CLASSPATH (removed from $HADOOP_HOME/conf) and the log4j.xml file is, then log4j.xml will be used.

Again, I'll do some research and fiddling and respond to your other questions. Thanks, Jakob!


Owen O'Malley added a comment - 09/Dec/08 06:32 PM
The commented out ones aren't deprecated, they are just other possibilities.

Alex Loddengaard added a comment - 26/Feb/09 09:04 AM
After some digging, and with some help from Aaron Kimball, I've realized that Hadoop's current implementation depends on a few global log4j variables to be set in order for Java code to dynamically overwrite the variable values. Namely, org.apache.hadoop.mapred.TaskRunner sets the "hadoop.tasklog.taskid" variable, which allows org.apache.hadoop.mapred.TaskLogAppender to output logs with the task ID in the log filename.

According to the DTD, Log4j XML files do not allow for top-level "param" elements, making it seem as though global variables are impossible in the XML file. With that said, Hadoop's current task log implementation is incompatible with Log4j XML files.

Though having XML files would be nice for purposes of unlocking Log4j features (AsyncAppenders, for example), I don't have enough Log4j fu to propose a new implementation of the current TaskLogAppender/TaskRunner/Log4j setup. Thoughts?