Hadoop Common
  1. Hadoop Common
  2. HADOOP-53

MapReduce log files should be storable in dfs.

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.16.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      It should be possible to cause a job's log output to be stored in dfs. The jobtracker's log output and (optionally) all tasktracker log output related to a job should be storable in a job-specified dfs directory.

      1. mapredDFSLog_v1.patch
        30 kB
        Enis Soztutar
      2. mapredDFSLog_v2.patch
        30 kB
        Enis Soztutar
      3. mapredDFSLog_v3.patch
        32 kB
        Enis Soztutar

        Issue Links

          Activity

          Hide
          eric baldeschwieler added a comment -

          We should determine how to capture and organize user logs in general before doing work to save them in HDFS

          Show
          eric baldeschwieler added a comment - We should determine how to capture and organize user logs in general before doing work to save them in HDFS
          Hide
          Enis Soztutar added a comment -

          Attaching a patch to store the tasks' logs to the FileSystem used. This is useful for example to store the logs permanently or to access the logs in a centralized way.

          The patch adds a new log4j Appender, called FsLogAppender, which appends logs to the files in the FileSystem. The appender is called FSLA. The old appender(TLA) still continues. The user can select which appender to use, and which log level to use via JobConf(). ex : job.setTaskLogRootLogger("INFO,TLA,FSLA");
          The user can also specify the location to save the logs : job.setTaskLogDir(Path);

          Now, at DEBUG level the logs from org.apache.hadoop will continue to polute the logs of the user's program, but we will defer this to a separate issue.

          The appender can also be used to store the logs of the framework(for debugging etc,), but again it is a seperate issue.

          Show
          Enis Soztutar added a comment - Attaching a patch to store the tasks' logs to the FileSystem used. This is useful for example to store the logs permanently or to access the logs in a centralized way. The patch adds a new log4j Appender, called FsLogAppender, which appends logs to the files in the FileSystem. The appender is called FSLA. The old appender(TLA) still continues. The user can select which appender to use, and which log level to use via JobConf(). ex : job.setTaskLogRootLogger("INFO,TLA,FSLA"); The user can also specify the location to save the logs : job.setTaskLogDir(Path); Now, at DEBUG level the logs from org.apache.hadoop will continue to polute the logs of the user's program, but we will defer this to a separate issue. The appender can also be used to store the logs of the framework(for debugging etc,), but again it is a seperate issue.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12367954/mapredDFSLog_v1.patch
          against trunk revision r586264.

          @author +1. The patch does not contain any @author tags.

          javadoc -1. The javadoc tool appears to have generated messages.

          javac +1. The applied patch does not generate any new compiler warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests -1. The patch failed contrib unit tests.

          Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/970/testReport/
          Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/970/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/970/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/970/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12367954/mapredDFSLog_v1.patch against trunk revision r586264. @author +1. The patch does not contain any @author tags. javadoc -1. The javadoc tool appears to have generated messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/970/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/970/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/970/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/970/console This message is automatically generated.
          Hide
          Enis Soztutar added a comment -

          The tasks using FsLogAppender enter a zombie state and wont finalize. I will supply another patch.

          Show
          Enis Soztutar added a comment - The tasks using FsLogAppender enter a zombie state and wont finalize. I will supply another patch.
          Hide
          Enis Soztutar added a comment -

          Patch updated to trunk, FsLogAppender is refactored to o.a.h.util.

          The tasks using FsLogAppender enter a zombie state and wont finalize.

          I have failed to reproduce this in several tests running different jobs, i guess the zombie processes were the leftovers caused by the earlier versions of the patch.

          Can someone review the patch please?

          Show
          Enis Soztutar added a comment - Patch updated to trunk, FsLogAppender is refactored to o.a.h.util. The tasks using FsLogAppender enter a zombie state and wont finalize. I have failed to reproduce this in several tests running different jobs, i guess the zombie processes were the leftovers caused by the earlier versions of the patch. Can someone review the patch please?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12368384/mapredDFSLog_v2.patch
          against trunk revision r588341.

          @author +1. The patch does not contain any @author tags.

          patch -1. The patch command could not apply the patch.

          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/988/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12368384/mapredDFSLog_v2.patch against trunk revision r588341. @author +1. The patch does not contain any @author tags. patch -1. The patch command could not apply the patch. Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/988/console This message is automatically generated.
          Hide
          Owen O'Malley added a comment -

          I think it would be far better to have the FsLogAppender just ignore any messages that came when the appender wasn't "ready". In particular, we should avoid having special methods that need to be invoked for initialization and closing. The FsLogAppender should work with non-default file systems, by doing:

          FileSystem fs = path.getFileSystem(conf);
          DataOutputStream out = fs.create(path);
          
          Show
          Owen O'Malley added a comment - I think it would be far better to have the FsLogAppender just ignore any messages that came when the appender wasn't "ready". In particular, we should avoid having special methods that need to be invoked for initialization and closing. The FsLogAppender should work with non-default file systems, by doing: FileSystem fs = path.getFileSystem(conf); DataOutputStream out = fs.create(path);
          Hide
          Enis Soztutar added a comment -

          The appender ignores the massages until it is properly initialized, but the thing is that the logger itself generates logging statements during initialization(for example ipc debug logs.). FsLogAppender will work on the filesystem that is active in the configuration given to its init() method.

          In particular, we should avoid having special methods that need to be invoked for initialization and closing.

          Current design does need extra initialization and finalization because we are using log4j's configurable way of using appenders. It is good that we can configure logging either to use fs or local files, but then we need to let log4j construct the appender for us, so we should somehow pass the conf object to the appender, right? We can definitely use something like

          JobConf#enableDFSLogging();
          JobConf#setLogLevel();
          JobConf#setLogDir();
          

          then construct FslogAppender, and add it to the rootLogger however what if we extend the logging system so that it can be used to store the logs of

          {job|task}

          trackers and

          {name|data}

          nodes. Then we should have custom code to set the appender rather that using conf/log4j.properties.

          Long story short, I think current architecture is slightly ugly, but I'm OK with it.

          Show
          Enis Soztutar added a comment - The appender ignores the massages until it is properly initialized, but the thing is that the logger itself generates logging statements during initialization(for example ipc debug logs.). FsLogAppender will work on the filesystem that is active in the configuration given to its init() method. In particular, we should avoid having special methods that need to be invoked for initialization and closing. Current design does need extra initialization and finalization because we are using log4j's configurable way of using appenders. It is good that we can configure logging either to use fs or local files, but then we need to let log4j construct the appender for us, so we should somehow pass the conf object to the appender, right? We can definitely use something like JobConf#enableDFSLogging(); JobConf#setLogLevel(); JobConf#setLogDir(); then construct FslogAppender, and add it to the rootLogger however what if we extend the logging system so that it can be used to store the logs of {job|task} trackers and {name|data} nodes. Then we should have custom code to set the appender rather that using conf/log4j.properties. Long story short, I think current architecture is slightly ugly, but I'm OK with it.
          Hide
          Enis Soztutar added a comment -

          Patch updated to meet svn trunk.

          Show
          Enis Soztutar added a comment - Patch updated to meet svn trunk.
          Hide
          Nate Carlson added a comment -

          Any chance we could get a patch against 0.15 for this?

          Show
          Nate Carlson added a comment - Any chance we could get a patch against 0.15 for this?
          Hide
          Enis Soztutar added a comment -

          Any chance we could get a patch against 0.15 for this?

          You can manually apply the patch to 0.15, it should not be hard.

          Show
          Enis Soztutar added a comment - Any chance we could get a patch against 0.15 for this? You can manually apply the patch to 0.15, it should not be hard.
          Hide
          Enis Soztutar added a comment -

          Closing this issue, since we will have a more general log aggregation framework : HADOOP-2206.

          Show
          Enis Soztutar added a comment - Closing this issue, since we will have a more general log aggregation framework : HADOOP-2206 .

            People

            • Assignee:
              Enis Soztutar
              Reporter:
              Doug Cutting
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development