Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.2.0
    • Fix Version/s: 0.3.0
    • Component/s: None
    • Labels:
      None

      Description

      Here's a proposal for some impovements to the way Hadoop does logging. It advocates 3
      broad changes to the way logging is currently done, these being:

      • The use of a uniform logging format by all Hadoop subsystems
      • The use of Apache commons logging as a facade above an underlying logging framework
      • The use of Log4J as the underlying logging framework instead of java.util.logging

      This is largely polishing work, but it seems like it would make log analysis and debugging
      easier in the short term. In the long term, it would future proof logging to the extent of
      allowing the logging framework used to change while requiring minimal code change. The
      propos changes are motivated by the following requirements which we think Hadoops
      logging should meet:

      • Hadoops logs should be amenable to analysis by tools like grep, sed, awk etc.
      • Log entries should be clearly annotated with a timestamp and a logging level
      • Log entries should be traceable to the subsystem from which they originated
      • The logging implementation should allow log entries to be annotated with source code
        location information like classname, methodname, file and line number, without requiring
        code changes
      • It should be possible to change the logging implementation used without having to change
        thousands of lines of code
      • The mapping of loggers to destinations (files, directories, servers etc.) should be
        specified and modifiable via configuration

      Uniform logging format:

      All Hadoop logs should have the following structure.

      <Header>\n
      <LogEntry>\n [<Exception>\n]
      .
      .
      .

      where the header line specifies the format of each log entry. The header line has the format:
      '# <Fieldname> <Fieldname>...\n'.

      The default format of each log entry is: '# Timestamp Level LoggerName Message', where:

      • Timestamp is a date and time in the format MM/DD/YYYY:HH:MM:SS
      • Level is the logging level (FATAL, WARN, DEBUG, TRACE, etc.)
      • LoggerName is the short name of the logging subsystem from which the message originated e.g.
        fs.FSNamesystem, dfs.Datanode etc.
      • Message is the log message produced

      Why Apache commons logging and Log4J?

      Apache commons logging is a facade meant to be used as a wrapper around an underlying logging
      implementation. Bridges from Apache commons logging to popular logging implementations
      (Java logging, Log4J, Avalon etc.) are implemented and available as part of the commons logging
      distribution. Implementing a bridge to an unsupported implementation is fairly striaghtforward
      and involves the implementation of subclasses of the commons logging LogFactory and Logger
      classes. Using Apache commons logging and making all logging calls through it enables us to
      move to a different logging implementation by simply changing configuration in the best case.
      Even otherwise, it incurs minimal code churn overhead.

      Log4J offers a few benefits over java.util.logging that make it a more desirable choice for the
      logging back end.

      • Configuration Flexibility: The mapping of loggers to destinations (files, sockets etc.)
        can be completely specified in configuration. It is possible to do this with Java logging as
        well, however, configuration is a lot more restrictive. For instance, with Java logging all
        log files must have names derived from the same pattern. For the namenode, log files could
        be named with the pattern "%h/namenode%u.log" which would put log files in the user.home
        directory with names like namenode0.log etc. With Log4J it would be possible to configure
        the namenode to emit log files with different names, say heartbeats.log, namespace.log,
        clients.log etc. Configuration variables in Log4J can also have the values of system
        properties embedded in them.
      • Takes wrappers into account: Log4J takes into account the possibility that an application
        may be invoking it via a wrapper, such as Apache commons logging. This is important because
        logging event objects must be able to infer the context of the logging call such as classname,
        methodname etc. Inferring context is a relatively expensive operation that involves creating
        an exception and examining the stack trace to find the frame just before the first frame
        of the logging framework. It is therefore done lazily only when this information actually
        needs to be logged. Log4J can be instructed to look for the frame corresponding to the wrapper
        class, Java logging cannot. In the case of Java logging this means that a) the bridge from
        Apache commons logging is responsible for inferring the calling context and setting it in the
        logging event and b) this inference has to be done on every logging call regardless of whether
        or not it is needed.
      • More handy features: Log4J has some handy features that Java logging doesn't. A couple
        of examples of these:
        a) Date based rolling of log files
        b) Format control through configuration. Log4J has a PatternLayout class that can be
        configured to generate logs with a user specified pattern. The logging format described
        above can be described as "%d {MM/dd/yyyy:HH:mm:SS}

        %c

        {2}

        %p %m". The format specifiers
        indicate that each log line should have the date and time followed by the logger name followed
        by the logging level or priority followed by the application generated message.

      1. patch.txt
        15 kB
        Sanjay Dahiya
      2. acl-log4j-webapps.patch
        1.0 kB
        Arun C Murthy
      3. acl-log4j-II.patch.tgz
        21 kB
        Arun C Murthy
      4. acl-log4j.patch
        114 kB
        Arun C Murthy
      5. commons_logging_patch
        167 kB
        Barry Kaplan

        Activity

        Hide
        eric baldeschwieler added a comment -

        I suggest we use iso8601 time format.

        http://www.cl.cam.ac.uk/~mgk25/iso-time.html

        This would suggest yyyy-MM-ddTHH:mm:SS , such as 2006-05-11T23:47:03

        The T is a literal and no one ever likes it. Change it for all I care, but standards are ok. This also suggests UTC, which I think is a good default, but also allows for local time, with a distinct notation 2006-05-11T23:47:03-08. We could support that as a config option if folks care.

        This format is also directly sortable, which is nice and avoids localization issues (MM-dd or dd-MM).

        Show
        eric baldeschwieler added a comment - I suggest we use iso8601 time format. http://www.cl.cam.ac.uk/~mgk25/iso-time.html This would suggest yyyy-MM-ddTHH:mm:SS , such as 2006-05-11T23:47:03 The T is a literal and no one ever likes it. Change it for all I care, but standards are ok. This also suggests UTC, which I think is a good default, but also allows for local time, with a distinct notation 2006-05-11T23:47:03-08. We could support that as a config option if folks care. This format is also directly sortable, which is nice and avoids localization issues (MM-dd or dd-MM).
        Hide
        Doug Cutting added a comment -

        I'm +1 for switching to commons-logging and to log4j by default. But I think we shouldn't mandate a format. It should be possible to embed Hadoop in other systems with other logging standards, and get it to comply with those standards. So most of what you suggest about log formats I think should be couched in the terms "by default", right?

        Show
        Doug Cutting added a comment - I'm +1 for switching to commons-logging and to log4j by default. But I think we shouldn't mandate a format. It should be possible to embed Hadoop in other systems with other logging standards, and get it to comply with those standards. So most of what you suggest about log formats I think should be couched in the terms "by default", right?
        Hide
        Owen O'Malley added a comment -

        I'm +1 for using a time format that is sortable. I've been using the sed, awk, grep tools for merging logs files to see trends across time.

        commons-logging and log4j sound good.

        Show
        Owen O'Malley added a comment - I'm +1 for using a time format that is sortable. I've been using the sed, awk, grep tools for merging logs files to see trends across time. commons-logging and log4j sound good.
        Hide
        Sameer Paranjpye added a comment -

        Yes, the suggestions about formats are meant to be defaults. This is one more reason for using Log4J, it gives you a fair amount of freedom with specifying formats in configuration.

        Show
        Sameer Paranjpye added a comment - Yes, the suggestions about formats are meant to be defaults. This is one more reason for using Log4J, it gives you a fair amount of freedom with specifying formats in configuration.
        Hide
        Doug Cutting added a comment -

        Even log4j should be a default. Hadoop code should only reference the commons-logging api, right? BTW, Sun's logging API also gives you complete freedom in formatting, although you have to write some Java classes, not just configure it with formatting strings as you can with log4j.

        Show
        Doug Cutting added a comment - Even log4j should be a default. Hadoop code should only reference the commons-logging api, right? BTW, Sun's logging API also gives you complete freedom in formatting, although you have to write some Java classes, not just configure it with formatting strings as you can with log4j.
        Hide
        Sameer Paranjpye added a comment -

        Wasn't really thinking in terms of making the logging implementation configurable, but there's no reason that can't be done. Hadoop code won't be invoking any part of log4j or whatever else directly.

        Sun's logging does give you complete freedom in formatting, I was pointing out that it's not as flexible as log4j where a lot can be achieved in configuration.

        We can have the logging implementation used be specified in configuration. Do you see a lot of people making use of that feature though? My instinct is that they won't...

        Show
        Sameer Paranjpye added a comment - Wasn't really thinking in terms of making the logging implementation configurable, but there's no reason that can't be done. Hadoop code won't be invoking any part of log4j or whatever else directly. Sun's logging does give you complete freedom in formatting, I was pointing out that it's not as flexible as log4j where a lot can be achieved in configuration. We can have the logging implementation used be specified in configuration. Do you see a lot of people making use of that feature though? My instinct is that they won't...
        Hide
        eric baldeschwieler added a comment -

        What real advantage do we get from all of this flexibility?

        One of our goals for attacking the logging system is to allow us to easily process the logs with the system. To do that will require building readers that can deal with the log format and organization. I'd hate to loose that in the interest of complete generality.

        Just curious what use case we are after with complete reformability and abstract logging.

        Show
        eric baldeschwieler added a comment - What real advantage do we get from all of this flexibility? One of our goals for attacking the logging system is to allow us to easily process the logs with the system. To do that will require building readers that can deal with the log format and organization. I'd hate to loose that in the interest of complete generality. Just curious what use case we are after with complete reformability and abstract logging.
        Hide
        Owen O'Malley added a comment -

        A lot of exceptions are currently being logged at the info level and most of them should probably be at the warn level. Especially, once we log the level it would be good to find the exceptions by grepping for WARN.

        Show
        Owen O'Malley added a comment - A lot of exceptions are currently being logged at the info level and most of them should probably be at the warn level. Especially, once we log the level it would be good to find the exceptions by grepping for WARN.
        Hide
        Doug Cutting added a comment -

        The semantics I use for levels is something like:

        SEVERE: if this is a production system, someone should be paged, red lights should flash, etc. Something is definitely wrong and the system is not operating correctly. Intervention is required. This should be used sparingly.

        WARN: in a production system, warnings should be propagated & summarized on a central console. If lots are generated then something may be wrong.

        INFO, FINE, FINER, etc. are used for debugging. INFO is the level normally logged in production, FINE, FINER, etc. are typically only used when developing.

        Is that consistent with the way others use these?

        Show
        Doug Cutting added a comment - The semantics I use for levels is something like: SEVERE: if this is a production system, someone should be paged, red lights should flash, etc. Something is definitely wrong and the system is not operating correctly. Intervention is required. This should be used sparingly. WARN: in a production system, warnings should be propagated & summarized on a central console. If lots are generated then something may be wrong. INFO, FINE, FINER, etc. are used for debugging. INFO is the level normally logged in production, FINE, FINER, etc. are typically only used when developing. Is that consistent with the way others use these?
        Hide
        Milind Bhandarkar added a comment -

        At least one place where level should be info and not warn is when we create a file on dfs. It tries to do mkdirs whether directory exists or not. If it already exists, it warns that there is an error creating the directory. I think the warning should be when the directory could not be created because of other factors (such as permissions), otherwise it should be info or even fine.

        Show
        Milind Bhandarkar added a comment - At least one place where level should be info and not warn is when we create a file on dfs. It tries to do mkdirs whether directory exists or not. If it already exists, it warns that there is an error creating the directory. I think the warning should be when the directory could not be created because of other factors (such as permissions), otherwise it should be info or even fine.
        Hide
        Barry Kaplan added a comment -

        At the very least switching to commons logging will make it easier when configuring hadoop within other applications. Currently this is one of the few libraries I use that I can't configure to use my standard log4j settings.

        Show
        Barry Kaplan added a comment - At the very least switching to commons logging will make it easier when configuring hadoop within other applications. Currently this is one of the few libraries I use that I can't configure to use my standard log4j settings.
        Hide
        eric baldeschwieler added a comment -

        True, user errors that cause no harm don't deserve warnings in a central log. The trick is to propogate the issue back to the user...

        Show
        eric baldeschwieler added a comment - True, user errors that cause no harm don't deserve warnings in a central log. The trick is to propogate the issue back to the user...
        Hide
        Barry Kaplan added a comment -

        I needed to get this working with log4j so I quickly ripped out the java.util.logging and replaced it with apache commons. The part I am unsure of is how much configuration of logging do you wish to do from hadoop itself. My opinion is let people configure the log4j the way they want it and hadoop should only choose whatever is available, and log to console if nothing is there (essentially what apache-commons does).

        Show
        Barry Kaplan added a comment - I needed to get this working with log4j so I quickly ripped out the java.util.logging and replaced it with apache commons. The part I am unsure of is how much configuration of logging do you wish to do from hadoop itself. My opinion is let people configure the log4j the way they want it and hadoop should only choose whatever is available, and log to console if nothing is there (essentially what apache-commons does).
        Hide
        Sanjay Dahiya added a comment -

        On default Log4J configuration format -

        Some points -
        1. Logging caller class information like - originating method/line no, File name are known to be expensive operations. Also they may not be supported by All JVMs. Do we want these in default config ?
        2. For namenode I added an option %X

        {client}, this enables logging client identification along with msg but needs the code to supply that information using MDC.put("client", clientName)
        3. We could do a seperate logger per client but thats a bad idea as there will be too many loggers.
        4. We can have logger hierarchy based on packages/classes and/or some logical hierarchy like - namenode.block.allocation, namenode.block.removal etc. Any comments on what type of log hierarchy you would like to see for the modules you worked on ?
        5. We can use MDC for categorizing logs on some criterion other than clients - like blocks, file system. Anythng you would like to see here ?

        A log4J format to start with - default is a RollingFile ( based on size, can make it based on time also ). Pls let me know your requirements if any and I will keep updating this file, once this done migrate codebase to log4j.
        -------------------------------------------------
        # remove DEBUG if not needed.
        log4j.rootLogger=DEBUG, RFA
        log4j.threshhold=ALL

        # Rolling File appender configuration
        log4j.appender.RFA=org.apache.log4j.RollingFileAppender
        log4j.appender.RFA.File=${HANDDOP_HOME}/hadoop.log

        # change file size
        log4j.appender.RFA.MaxFileSize=1MB
        log4j.appender.RFA.MaxBackupIndex=10

        log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
        log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n

        # logically seperated logger configurations
        log4j.logger.namenode=RFA
        log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
        log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %X{client}

        %c

        {2}

        (%F:%M(%L)) - %m%n

        Show
        Sanjay Dahiya added a comment - On default Log4J configuration format - Some points - 1. Logging caller class information like - originating method/line no, File name are known to be expensive operations. Also they may not be supported by All JVMs. Do we want these in default config ? 2. For namenode I added an option %X {client}, this enables logging client identification along with msg but needs the code to supply that information using MDC.put("client", clientName) 3. We could do a seperate logger per client but thats a bad idea as there will be too many loggers. 4. We can have logger hierarchy based on packages/classes and/or some logical hierarchy like - namenode.block.allocation, namenode.block.removal etc. Any comments on what type of log hierarchy you would like to see for the modules you worked on ? 5. We can use MDC for categorizing logs on some criterion other than clients - like blocks, file system. Anythng you would like to see here ? A log4J format to start with - default is a RollingFile ( based on size, can make it based on time also ). Pls let me know your requirements if any and I will keep updating this file, once this done migrate codebase to log4j. ------------------------------------------------- # remove DEBUG if not needed. log4j.rootLogger=DEBUG, RFA log4j.threshhold=ALL # Rolling File appender configuration log4j.appender.RFA=org.apache.log4j.RollingFileAppender log4j.appender.RFA.File=${HANDDOP_HOME}/hadoop.log # change file size log4j.appender.RFA.MaxFileSize=1MB log4j.appender.RFA.MaxBackupIndex=10 log4j.appender.RFA.layout=org.apache.log4j.PatternLayout log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n # logically seperated logger configurations log4j.logger.namenode=RFA log4j.appender.RFA.layout=org.apache.log4j.PatternLayout log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %X{client} %c {2} (%F:%M(%L)) - %m%n
        Hide
        Arun C Murthy added a comment -

        Here's attached the patch for switching to commons logging with log4j as the logging framework.

        Notes:

        a) Hopefully this can be incorporated asap into hadoop svn since any further commits will entail us to patch anew.

        b) The conf/log4j.properties file uses DailyRollingFileAppender which rolls over at midnight. Another choice is to use RollingFileAppender which will rollover every 1MB and maintain 30 backups, which is currently commented out.

        c) There are some parts of code for e.g. those configuring the log-file, loog-levels etc. in the code which we have had to comment out since they aren't supported by Commons Logging. They should now be configured via the .properties file.

        d) The framework as in the patch creates 4 separate logfiles i.e.
        > $HADOOP_HOME/logs/namenode.log
        > $HADOOP_HOME/logs/datanode.log
        > $HADOOP_HOME/logs/jobtracker.log
        > $HADOOP_HOME/logs/tasktracker.log

        This is done by passing -Dhadoop.log.file=<logfilename>.log via the bin/hadoop startup script and referenced in log4j.properties.

        thanks,
        Arun

        PS: This will have to be committed before the libhdfs patch, and after that I will go ahead and create a 2 line patch for TestDFSCIO.java (switch java.util.logging to commons logging).

        Show
        Arun C Murthy added a comment - Here's attached the patch for switching to commons logging with log4j as the logging framework. Notes: a) Hopefully this can be incorporated asap into hadoop svn since any further commits will entail us to patch anew. b) The conf/log4j.properties file uses DailyRollingFileAppender which rolls over at midnight. Another choice is to use RollingFileAppender which will rollover every 1MB and maintain 30 backups, which is currently commented out. c) There are some parts of code for e.g. those configuring the log-file, loog-levels etc. in the code which we have had to comment out since they aren't supported by Commons Logging. They should now be configured via the .properties file. d) The framework as in the patch creates 4 separate logfiles i.e. > $HADOOP_HOME/logs/namenode.log > $HADOOP_HOME/logs/datanode.log > $HADOOP_HOME/logs/jobtracker.log > $HADOOP_HOME/logs/tasktracker.log This is done by passing -Dhadoop.log.file=<logfilename>.log via the bin/hadoop startup script and referenced in log4j.properties. thanks, Arun PS: This will have to be committed before the libhdfs patch, and after that I will go ahead and create a 2 line patch for TestDFSCIO.java (switch java.util.logging to commons logging).
        Hide
        Doug Cutting added a comment -

        Unfortunately this patch no longer applies cleanly. Can you please update your source tree and re-generate this patch? I try to process patches in the order they are submitted, but frequently there are conflicts in the queue. Thanks!

        Show
        Doug Cutting added a comment - Unfortunately this patch no longer applies cleanly. Can you please update your source tree and re-generate this patch? I try to process patches in the order they are submitted, but frequently there are conflicts in the queue. Thanks!
        Hide
        Doug Cutting added a comment -

        Also, we should not yet remove LogFormatter, but only deprecate it, as user code (e.g., Nutch) may use this class. A good test for back-compatibility of changes to Hadoop's public APIs is to check that Nutch still compiles and runs correctly with an updated Hadoop jar.

        If we deprecate LogFormatter in 0.3 then we can remove it in Hadoop 0.4, but we should always give folks at least one release to remove dependencies on deprecated features.

        Thanks again!

        Show
        Doug Cutting added a comment - Also, we should not yet remove LogFormatter, but only deprecate it, as user code (e.g., Nutch) may use this class. A good test for back-compatibility of changes to Hadoop's public APIs is to check that Nutch still compiles and runs correctly with an updated Hadoop jar. If we deprecate LogFormatter in 0.3 then we can remove it in Hadoop 0.4, but we should always give folks at least one release to remove dependencies on deprecated features. Thanks again!
        Hide
        Doug Cutting added a comment -

        One more thing: please don't just comment out obsolete code; delete it. Thanks.

        http://wiki.apache.org/lucene-hadoop/HowToContribute

        Show
        Doug Cutting added a comment - One more thing: please don't just comment out obsolete code; delete it. Thanks. http://wiki.apache.org/lucene-hadoop/HowToContribute
        Hide
        Arun C Murthy added a comment -

        Doug,

        Please find the patch for commons-logging/log4j against the latest snapshot.

        I have incorporated all your comments i.e. kept LogFormatter.java, removed dead-wood etc.

        Please try and apply this patch on a priority basis since it touches quite a bit of the code-base and potentially any commit will necessiate another patch!

        thanks,
        Arun

        PS: We have removed hadoop/conf from the build-time classpath in build.xml since we don't want hadoop's log4j.properties to be picked up by the build-tools. We feel hadoop/conf isn't necessary for building at all. Please correct us if we are wrong.

        Show
        Arun C Murthy added a comment - Doug, Please find the patch for commons-logging/log4j against the latest snapshot. I have incorporated all your comments i.e. kept LogFormatter.java, removed dead-wood etc. Please try and apply this patch on a priority basis since it touches quite a bit of the code-base and potentially any commit will necessiate another patch! thanks, Arun PS: We have removed hadoop/conf from the build-time classpath in build.xml since we don't want hadoop's log4j.properties to be picked up by the build-tools. We feel hadoop/conf isn't necessary for building at all. Please correct us if we are wrong.
        Hide
        Arun C Murthy added a comment -

        Minor point: we will need commons-logging-1.0.4.jar and log4j-1...jar in lib/

        thanks,
        Arun

        Show
        Arun C Murthy added a comment - Minor point: we will need commons-logging-1.0.4.jar and log4j-1. . .jar in lib/ thanks, Arun
        Hide
        Doug Cutting added a comment -

        I can apply this by reverting to revision 410692, then use 'svn up' to merge in subsequent changes. But I'm still having trouble building and running unit tests (at least on Windows, which I'm using today, since I'm on the road). The change to build.xml causes hadoop-default.xml not to be found. When I fix that, Jasper fails to be able to compile the jsp pages, since it uses log4j. 'ant clean test' reports:

        Buildfile: build.xml
        [taskdef] log4j:ERROR setFile(null,true) call failed.
        [taskdef] java.io.FileNotFoundException: \ (The system cannot find the path s
        ecified)
        [taskdef] at java.io.FileOutputStream.openAppend(Native Method)
        [taskdef] at java.io.FileOutputStream.<init>(FileOutputStream.java:177)
        [taskdef] at java.io.FileOutputStream.<init>(FileOutputStream.java:102)
        [taskdef] at org.apache.log4j.FileAppender.setFile(FileAppender.java:289)

        So I'm (sadly) not quite able to commit this yet.

        Show
        Doug Cutting added a comment - I can apply this by reverting to revision 410692, then use 'svn up' to merge in subsequent changes. But I'm still having trouble building and running unit tests (at least on Windows, which I'm using today, since I'm on the road). The change to build.xml causes hadoop-default.xml not to be found. When I fix that, Jasper fails to be able to compile the jsp pages, since it uses log4j. 'ant clean test' reports: Buildfile: build.xml [taskdef] log4j:ERROR setFile(null,true) call failed. [taskdef] java.io.FileNotFoundException: \ (The system cannot find the path s ecified) [taskdef] at java.io.FileOutputStream.openAppend(Native Method) [taskdef] at java.io.FileOutputStream.<init>(FileOutputStream.java:177) [taskdef] at java.io.FileOutputStream.<init>(FileOutputStream.java:102) [taskdef] at org.apache.log4j.FileAppender.setFile(FileAppender.java:289) So I'm (sadly) not quite able to commit this yet.
        Hide
        Doug Cutting added a comment -

        Okay, I've worked out the configuration problems. Now it appears that some things that were previously logged at level=FINE are now logged at INFO, when they should be DEBUG. I'm working on fixing that...

        Show
        Doug Cutting added a comment - Okay, I've worked out the configuration problems. Now it appears that some things that were previously logged at level=FINE are now logged at INFO, when they should be DEBUG. I'm working on fixing that...
        Hide
        Doug Cutting added a comment -

        With some trepidation, I just committed this.

        There were a number of problems with this patch. It changed all level=fine log messages into level=info, rather than level=debug. The build needed to be repaired as well, since logging is performed there, but the standard configuration is not appropriate during build, so I added a build/test log4j configuration. Finally, the changes to bin/hadoop did not name the log files correctly: the correct log file name should be normally set in bin/hadoop-daemon.sh and used in bin/hadoop. I fixed all of these.

        In the future, we should not try to make such large changes in the last days before a release. Such changes should be made early in the release cycle. I suspect we will still encounter more problems with this as I now try to make a release and test things with Nutch for back-compatibility, on Windows, etc.

        Show
        Doug Cutting added a comment - With some trepidation, I just committed this. There were a number of problems with this patch. It changed all level=fine log messages into level=info, rather than level=debug. The build needed to be repaired as well, since logging is performed there, but the standard configuration is not appropriate during build, so I added a build/test log4j configuration. Finally, the changes to bin/hadoop did not name the log files correctly: the correct log file name should be normally set in bin/hadoop-daemon.sh and used in bin/hadoop. I fixed all of these. In the future, we should not try to make such large changes in the last days before a release. Such changes should be made early in the release cycle. I suspect we will still encounter more problems with this as I now try to make a release and test things with Nutch for back-compatibility, on Windows, etc.
        Hide
        Arun C Murthy added a comment -

        Apologise for all the troubles... we assumed fine->info, finer->debug, finest->trace mappings; we should have run this through you once.

        Next time please throw out the patch if we screw up and let us scramble to fix the mess we created... appreciate your patience. Thanks!

        Show
        Arun C Murthy added a comment - Apologise for all the troubles... we assumed fine->info, finer->debug, finest->trace mappings; we should have run this through you once. Next time please throw out the patch if we screw up and let us scramble to fix the mess we created... appreciate your patience. Thanks!
        Hide
        Arun C Murthy added a comment -

        We seemed to have missed out getMapOutput.jsp in the earlier patch... here's the fix. Thanks!

        Show
        Arun C Murthy added a comment - We seemed to have missed out getMapOutput.jsp in the earlier patch... here's the fix. Thanks!
        Hide
        Sanjay Dahiya added a comment -

        New extensions to hadoop logging -

        • Rolling based on both time and size.
        • compress
        • Move rolled over files to DFS

        Log4J 1.3 has a better way of defining rollover policies and actions, I have a working implementation of above but they depend on Log4J 1.3. Also the properties file will change to an XML format as the .properties format doesn't support all the configurations yet. Are we willing to move to 1.3 and XML log4j properties ?

        Show
        Sanjay Dahiya added a comment - New extensions to hadoop logging - Rolling based on both time and size. compress Move rolled over files to DFS Log4J 1.3 has a better way of defining rollover policies and actions, I have a working implementation of above but they depend on Log4J 1.3. Also the properties file will change to an XML format as the .properties format doesn't support all the configurations yet. Are we willing to move to 1.3 and XML log4j properties ?
        Hide
        Barry Kaplan added a comment -

        According to the tomcat 5.5 docs, XML configuration files don't allow you to use naming convention for logs within tomcat. As someone who uses tomcat and hadoop I would prefer not using the xml log.

        You can (and should) be more picky about which packages to include in the logging. Tomcat 5.5 uses defines loggers by Engine and Host names. For example, for a default Catalina localhost log, add this to the end of the log4j.properties above. Note that there are known issues with using this naming convention (with square brackets) in log4j XML based configuration files, so we recommend you use a properties file as described until a future version of log4j allows this convention.

        Show
        Barry Kaplan added a comment - According to the tomcat 5.5 docs, XML configuration files don't allow you to use naming convention for logs within tomcat. As someone who uses tomcat and hadoop I would prefer not using the xml log. You can (and should) be more picky about which packages to include in the logging. Tomcat 5.5 uses defines loggers by Engine and Host names. For example, for a default Catalina localhost log, add this to the end of the log4j.properties above. Note that there are known issues with using this naming convention (with square brackets) in log4j XML based configuration files, so we recommend you use a properties file as described until a future version of log4j allows this convention.
        Hide
        Sanjay Dahiya added a comment -

        I'm looking at support logging features like (cap on time/size, gzip) and archiving log files into DFS. Log4j 1.3 with XML configurations makes it real easy to implement all these with the RollingPolicies and Triggers separated from appenders. properties file format doesn't allow for specifying RollingPolicies externally for existing Appenders.
        Are you embedding Tomcat within Hadoop or using Hadoop from a webapp? Is it possible to make tomcat use its own properties file or configure Log4J for the webapp separately in the webapp's class loader?

        Show
        Sanjay Dahiya added a comment - I'm looking at support logging features like (cap on time/size, gzip) and archiving log files into DFS. Log4j 1.3 with XML configurations makes it real easy to implement all these with the RollingPolicies and Triggers separated from appenders. properties file format doesn't allow for specifying RollingPolicies externally for existing Appenders. Are you embedding Tomcat within Hadoop or using Hadoop from a webapp? Is it possible to make tomcat use its own properties file or configure Log4J for the webapp separately in the webapp's class loader?
        Hide
        Barry Kaplan added a comment -

        I am using Hadoop within tomcat, my guess is there is a way to make hadoop use its own log properties that is separate from tomcat's, but it will be rather annoying to have a separate log4j.properties on a library by library basis.

        Show
        Barry Kaplan added a comment - I am using Hadoop within tomcat, my guess is there is a way to make hadoop use its own log properties that is separate from tomcat's, but it will be rather annoying to have a separate log4j.properties on a library by library basis.
        Hide
        Sanjay Dahiya added a comment -

        From what I understand, you are using hadoop client in tomcat. We need a light weight client for embedding in other apps for these use cases. That client can use apps logging configuration.
        This logging is primarily targeted at Name node, data node, and trackers which generate logs on an ongoing basis. These log configurations need to be separated from Hadoop client in any case.

        Show
        Sanjay Dahiya added a comment - From what I understand, you are using hadoop client in tomcat. We need a light weight client for embedding in other apps for these use cases. That client can use apps logging configuration. This logging is primarily targeted at Name node, data node, and trackers which generate logs on an ongoing basis. These log configurations need to be separated from Hadoop client in any case.
        Hide
        eric baldeschwieler added a comment -

        Is the client code logging? If so we should file another bug to make this independent of all of the server logging for sure. As barry explains his rig, I think he is actually running the hadoop servers inside tomcat as well. This is a more complicated issue. It may well make sense to support this, don't know enough about the environment to understand the pros and cons. An obvious pro is that he has fewer processes to shepherd. A con is that it isn't something anyone else has worked through or made work.

        Show
        eric baldeschwieler added a comment - Is the client code logging? If so we should file another bug to make this independent of all of the server logging for sure. As barry explains his rig, I think he is actually running the hadoop servers inside tomcat as well. This is a more complicated issue. It may well make sense to support this, don't know enough about the environment to understand the pros and cons. An obvious pro is that he has fewer processes to shepherd. A con is that it isn't something anyone else has worked through or made work.
        Hide
        eric baldeschwieler added a comment -

        Barry, I've filed a new enhancement bug (ability to run datanodes in tomcat) to serve as an umbrella for this issue. I think you've filed another bug on jeti related to this as well. It would be good to tie all the tomcat issues together. I'd be curious to know what others on the list (who know more about java/tomcat) think about this proposal, but I think we should move the discussion off this bug.

        http://issues.apache.org/jira/browse/HADOOP-353 - Run datanode (or other hadoop servers) inside tomcat

        Show
        eric baldeschwieler added a comment - Barry, I've filed a new enhancement bug (ability to run datanodes in tomcat) to serve as an umbrella for this issue. I think you've filed another bug on jeti related to this as well. It would be good to tie all the tomcat issues together. I'd be curious to know what others on the list (who know more about java/tomcat) think about this proposal, but I think we should move the discussion off this bug. http://issues.apache.org/jira/browse/HADOOP-353 - Run datanode (or other hadoop servers) inside tomcat
        Hide
        Sanjay Dahiya added a comment -

        A patch for rolling hadoop logs from all nodes to a well defined directory structure in DFS. Log files can be optionally compressed with gzip or zip. Log files are rolled over after a default 10MB or at midnight. The files are rolled over in DFS at a configurable path in a directory structure e.g. -
        <archive path>/year/month/day/logfile_index.log.gz

        This patch depends on Log4J 1.3, so we may not want to commit it in main trunk yet. It can be used if one doesnt mind using log4j's XML configuration file format. Log4J's DailyRollingFileAppendr will be deperecated in future versions so we will have to use this inevitably.

        Please remember to upgrade the log4j.jar file to 1.3 version and remove the old version or this patch will not compile.

        Show
        Sanjay Dahiya added a comment - A patch for rolling hadoop logs from all nodes to a well defined directory structure in DFS. Log files can be optionally compressed with gzip or zip. Log files are rolled over after a default 10MB or at midnight. The files are rolled over in DFS at a configurable path in a directory structure e.g. - <archive path>/year/month/day/logfile_index.log.gz This patch depends on Log4J 1.3, so we may not want to commit it in main trunk yet. It can be used if one doesnt mind using log4j's XML configuration file format. Log4J's DailyRollingFileAppendr will be deperecated in future versions so we will have to use this inevitably. Please remember to upgrade the log4j.jar file to 1.3 version and remove the old version or this patch will not compile.

          People

          • Assignee:
            Sameer Paranjpye
            Reporter:
            Sameer Paranjpye
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development