Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.4
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      Currently, it's fairly difficult to configure logging beyond a generic log level. It'll be useful to have some control over which components need to be logged at a level / should be avoided. The IPC layer, for example, generates a lot of (multi-line) noise - which isn't useful when looking for Tez logs only.

      1. TEZ-1775.5.branch-0.5.txt
        43 kB
        Siddharth Seth
      2. TEZ-1775.5.txt
        44 kB
        Siddharth Seth
      3. TEZ-1775.4.txt
        44 kB
        Siddharth Seth
      4. TEZ-1775.3.txt
        44 kB
        Siddharth Seth
      5. TEZ-1775.2.txt
        41 kB
        Siddharth Seth
      6. TEZ-1775.1.wip.txt
        11 kB
        Siddharth Seth

        Activity

        Hide
        gopalv Gopal V added a comment -

        Very useful, particularly for processor logging (log only one Operator class in DEBUG).

        Show
        gopalv Gopal V added a comment - Very useful, particularly for processor logging (log only one Operator class in DEBUG).
        Hide
        hitesh Hitesh Shah added a comment -

        Are we talking about supporting a new log4j properties file?

        Show
        hitesh Hitesh Shah added a comment - Are we talking about supporting a new log4j properties file?
        Hide
        sseth Siddharth Seth added a comment - - edited

        Either a new log4j properties file, or potentially passing in additional command line parameters without users having to setup everything.
        Currently, trying to set a single log property - requires setting up the app log dir, the log4j properties file as well as root logger settings. We should at least be able to augment what is the default.
        e.g. Continue using tez-container-log4j.properties, along with the yarn app dir setup and tez.root.logger. Add in an additional property - "tez.log.additional.properties" - which can be used to configure other aspects of the logger. This may already be possible via tez-am-cmd-opts / tez-task-cmd-opts. Need to check.

        Open to other suggestions on how this can be simplified. The main pain point at the moment is level DEBUG means tons of extra log messages which just create noise.

        Show
        sseth Siddharth Seth added a comment - - edited Either a new log4j properties file, or potentially passing in additional command line parameters without users having to setup everything. Currently, trying to set a single log property - requires setting up the app log dir, the log4j properties file as well as root logger settings. We should at least be able to augment what is the default. e.g. Continue using tez-container-log4j.properties, along with the yarn app dir setup and tez.root.logger. Add in an additional property - "tez.log.additional.properties" - which can be used to configure other aspects of the logger. This may already be possible via tez-am-cmd-opts / tez-task-cmd-opts. Need to check. Open to other suggestions on how this can be simplified. The main pain point at the moment is level DEBUG means tons of extra log messages which just create noise.
        Hide
        sseth Siddharth Seth added a comment -

        Looks like cmd opts doesn't help much. Another possibility is to generate a log4j configuration properties file dynamically - and ship it along with the task, where a user can specify log levels for different categories.

        Show
        sseth Siddharth Seth added a comment - Looks like cmd opts doesn't help much. Another possibility is to generate a log4j configuration properties file dynamically - and ship it along with the task, where a user can specify log levels for different categories.
        Hide
        rajesh.balamohan Rajesh Balamohan added a comment - - edited

        Siddharth Seth, task specific cmd options might be of help here.

        test.properties
        ===============
        hadoop.root.logger=DEBUG,CLA
        
        # Define the root logger to the system property "tez.root.logger".
        log4j.rootLogger=${tez.root.logger}, EventCounter
        log4j.threshold=ALL
        log4j.appender.CLA=org.apache.tez.common.TezContainerLogAppender
        log4j.appender.CLA.containerLogDir=${yarn.app.container.log.dir}
        log4j.appender.CLA.layout=org.apache.log4j.PatternLayout
        log4j.appender.CLA.layout.ConversionPattern=%d{ISO8601} %p [%t] %c{2}: %m%n
        log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
        
        # User specific DEBUG levels
        log4j.logger.org.apache.tez.runtime.task.TezChild=DEBUG
        
        

        In hive, add "add FILE test.properties" in the begining of the query. This would place "test.properties" in distributed cache.

        Now, run hive (with task specific tez options) command as follows:

        hive --database mydb --hiveconf tez.task-specific.launch.cmd-opts.list="Map 2[1]" --hiveconf tez.task-specific.launch.cmd-opts="-Dlog4j.configuration=test.properties" -f /home/rajesh/test.sql
        

        Here, user is trying to enable DEBUG log (i.e with test.properties) for task #1 of "Map 2" phase. If we need to enable for entire "Map 2", we can just specify "Map 2[]". If we need to enable for multiple phases, we can add "Map 2[],Map 8[],Reducer 1[]".

        This should enable DEBUG logs very specific to "org.apache.tez.runtime.task.TezChild" and this can be verified by grepping the logs for this application

        "yarn logs -applicationId <app_id>"
        

        I tried the above approach on the test cluster and it works fine.

        I hope other components like Pig will have a mechanism to upload files to distributed cache. As long as "test.properties" is available in disibtributed cache, the above approach would work.

        Caveat:
        ======
        1. For DAGAppMaster, it loads "-Dlog4j.configuration=tez-container-log4j.properties" after "tez.am.launch.cmd-opts".
        So specifying --hiveconf tez.am.launch.cmd-opts="-Dlog4j.configuration=test.properties" to debug DAGAppMaster would not work. We might need to change the order in TezClientUtils.createApplicationSubmissionContext() to load TEZ_AM_LAUNCH_CMD_OPTS at the end. If this is done, same approach should ideally work for DAGAppMaster as well (I haven't tried this for DAGAppMaster).

        Show
        rajesh.balamohan Rajesh Balamohan added a comment - - edited Siddharth Seth , task specific cmd options might be of help here. test.properties =============== hadoop.root.logger=DEBUG,CLA # Define the root logger to the system property "tez.root.logger" . log4j.rootLogger=${tez.root.logger}, EventCounter log4j.threshold=ALL log4j.appender.CLA=org.apache.tez.common.TezContainerLogAppender log4j.appender.CLA.containerLogDir=${yarn.app.container.log.dir} log4j.appender.CLA.layout=org.apache.log4j.PatternLayout log4j.appender.CLA.layout.ConversionPattern=%d{ISO8601} %p [%t] %c{2}: %m%n log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter # User specific DEBUG levels log4j.logger.org.apache.tez.runtime.task.TezChild=DEBUG In hive, add "add FILE test.properties" in the begining of the query. This would place "test.properties" in distributed cache. Now, run hive (with task specific tez options) command as follows: hive --database mydb --hiveconf tez.task-specific.launch.cmd-opts.list= "Map 2[1]" --hiveconf tez.task-specific.launch.cmd-opts= "-Dlog4j.configuration=test.properties" -f /home/rajesh/test.sql Here, user is trying to enable DEBUG log (i.e with test.properties) for task #1 of "Map 2" phase. If we need to enable for entire "Map 2", we can just specify "Map 2[]". If we need to enable for multiple phases, we can add "Map 2[],Map 8[],Reducer 1[]". This should enable DEBUG logs very specific to "org.apache.tez.runtime.task.TezChild" and this can be verified by grepping the logs for this application "yarn logs -applicationId <app_id>" I tried the above approach on the test cluster and it works fine. I hope other components like Pig will have a mechanism to upload files to distributed cache. As long as "test.properties" is available in disibtributed cache, the above approach would work. Caveat: ====== 1. For DAGAppMaster, it loads "-Dlog4j.configuration=tez-container-log4j.properties" after "tez.am.launch.cmd-opts". So specifying --hiveconf tez.am.launch.cmd-opts="-Dlog4j.configuration=test.properties" to debug DAGAppMaster would not work. We might need to change the order in TezClientUtils.createApplicationSubmissionContext() to load TEZ_AM_LAUNCH_CMD_OPTS at the end. If this is done, same approach should ideally work for DAGAppMaster as well (I haven't tried this for DAGAppMaster).
        Hide
        sseth Siddharth Seth added a comment -

        Rajesh Balamohan - I'd like to have an approach where users only specify the classes/categories for which they want to change the log level. Not having to specify a full properties file (with the log path, log file name etc) would be ideal. We could generate the equivalent of test.properties within Tez itself. For now, I'm looking at alternates of either having multi-line property replacement or having a fixed set of properties (maybe 5), which users can replace.
        Tying that in with task specific properties would be even better.

        Show
        sseth Siddharth Seth added a comment - Rajesh Balamohan - I'd like to have an approach where users only specify the classes/categories for which they want to change the log level. Not having to specify a full properties file (with the log path, log file name etc) would be ideal. We could generate the equivalent of test.properties within Tez itself. For now, I'm looking at alternates of either having multi-line property replacement or having a fixed set of properties (maybe 5), which users can replace. Tying that in with task specific properties would be even better.
        Hide
        sseth Siddharth Seth added a comment - - edited

        WIP patch.
        This allows the following usage pattern.

        tez.am|task.log.level can now be specified as

        DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO
        

        General log level is DEBUG. hadoop.ipc, hadoop.security set to INFO level.

        I'm planning to add a new parameter TEZ_TASK_SPECIFIC_LOG_LEVEL - which will allow setting this up at a task level.

        This could be done without affecting re-use - by completely re-initializing runtime config, but that can be a follow up jira.

        Hitesh Shah, Rajesh Balamohan - mind taking a look. Any thoughts on the approach, or making this easier to use ?

        Show
        sseth Siddharth Seth added a comment - - edited WIP patch. This allows the following usage pattern. tez.am|task.log.level can now be specified as DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO General log level is DEBUG. hadoop.ipc, hadoop.security set to INFO level. I'm planning to add a new parameter TEZ_TASK_SPECIFIC_LOG_LEVEL - which will allow setting this up at a task level. This could be done without affecting re-use - by completely re-initializing runtime config, but that can be a follow up jira. Hitesh Shah , Rajesh Balamohan - mind taking a look. Any thoughts on the approach, or making this easier to use ?
        Hide
        rajesh.balamohan Rajesh Balamohan added a comment -

        looks good overall. It would be good to print the changes in the logs/console, so that the user is aware that the settings will take into effect.

        Would there be a situation where we need to enable this only for specific tasks/vertices? (e.g enable debug log for FetcherOrderedGrouped only for Map_5 phase). In such case, should "-Dtez.container.log.params" be passed along with "tez.task-specific.launch.cmd-opts" ?

        Show
        rajesh.balamohan Rajesh Balamohan added a comment - looks good overall. It would be good to print the changes in the logs/console, so that the user is aware that the settings will take into effect. Would there be a situation where we need to enable this only for specific tasks/vertices? (e.g enable debug log for FetcherOrderedGrouped only for Map_5 phase). In such case, should "-Dtez.container.log.params" be passed along with "tez.task-specific.launch.cmd-opts" ?
        Hide
        sseth Siddharth Seth added a comment - - edited

        Updated patch which allows for task specific log levels.

        It would be good to print the changes in the logs/console

        Do you mean printing the log level that will be used on the client before each job starts ?

        Ready for review.

        The following are the options that can be used.

          <property>
            <name>tez.am.log.level</name>
            <value>INFO;org.apache.tez.dag.app=INFO</value>
            <!--value>INFO</value-->
          </property>
        
          <property>
            <name>tez.task.log.level</name>
            <!--value>DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO</value-->
            <value>INFO</value>
          </property>
        
         <property>
            <name>tez.task-specific.log.level</name>
            <value>DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO</value>
          </property>
        
        Show
        sseth Siddharth Seth added a comment - - edited Updated patch which allows for task specific log levels. It would be good to print the changes in the logs/console Do you mean printing the log level that will be used on the client before each job starts ? Ready for review. The following are the options that can be used. <property> <name>tez.am.log.level</name> <value>INFO;org.apache.tez.dag.app=INFO</value> <!--value>INFO</value--> </property> <property> <name>tez.task.log.level</name> <!--value>DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO</value--> <value>INFO</value> </property> <property> <name>tez.task-specific.log.level</name> <value>DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO</value> </property>
        Hide
        sseth Siddharth Seth added a comment -

        Rajesh Balamohan, Hitesh Shah - please review.

        Show
        sseth Siddharth Seth added a comment - Rajesh Balamohan , Hitesh Shah - please review.
        Hide
        rajesh.balamohan Rajesh Balamohan added a comment -

        Minor comments

        • Should TezClientUtils.parseLogParams() return null when logConfig is empty/null?
        • If so, can the if-else condition in TaskSpecificLaunchCmdOption(Configuration conf) can be made simpler to the following?
        • getSpecificTasks() can be refactored to return null
                  if (Strings.isNullOrEmpty(tsLaunchCmdOpts) || specificTaskList.isEmpty() || !isValid(specificTaskList)) {
                          return null;
                  }
          
                   public TaskSpecificLaunchCmdOption(Configuration conf) {
                          tsLaunchCmdOpts = conf.getTrimmed(TezConfiguration.TEZ_TASK_SPECIFIC_LAUNCH_CMD_OPTS);
                          this.tasksMap = getSpecificTasks(conf);
                          this.tsLogParams = TezClientUtils.parseLogParams(conf.getTrimmed(TezConfiguration.TEZ_TASK_SPECIFIC_LOG_LEVEL));
                  }
                  
        Show
        rajesh.balamohan Rajesh Balamohan added a comment - Minor comments Should TezClientUtils.parseLogParams() return null when logConfig is empty/null? If so, can the if-else condition in TaskSpecificLaunchCmdOption(Configuration conf) can be made simpler to the following? getSpecificTasks() can be refactored to return null if (Strings.isNullOrEmpty(tsLaunchCmdOpts) || specificTaskList.isEmpty() || !isValid(specificTaskList)) { return null ; } public TaskSpecificLaunchCmdOption(Configuration conf) { tsLaunchCmdOpts = conf.getTrimmed(TezConfiguration.TEZ_TASK_SPECIFIC_LAUNCH_CMD_OPTS); this .tasksMap = getSpecificTasks(conf); this .tsLogParams = TezClientUtils.parseLogParams(conf.getTrimmed(TezConfiguration.TEZ_TASK_SPECIFIC_LOG_LEVEL)); }
        Hide
        sseth Siddharth Seth added a comment -

        Should TezClientUtils.parseLogParams() return null when logConfig is empty/null?

        Done

        getSpecificTasks() can be refactored to return null

        Done

        If so, can the if-else condition in TaskSpecificLaunchCmdOption(Configuration conf) can be made simpler to the following?

        Simplified this, but still contains some conditionals to only parse the strings when necessary.

        Show
        sseth Siddharth Seth added a comment - Should TezClientUtils.parseLogParams() return null when logConfig is empty/null? Done getSpecificTasks() can be refactored to return null Done If so, can the if-else condition in TaskSpecificLaunchCmdOption(Configuration conf) can be made simpler to the following? Simplified this, but still contains some conditionals to only parse the strings when necessary.
        Hide
        sseth Siddharth Seth added a comment -

        Updated patch with some unit test fixes, and a bugfix.

        Show
        sseth Siddharth Seth added a comment - Updated patch with some unit test fixes, and a bugfix.
        Hide
        rajesh.balamohan Rajesh Balamohan added a comment -

        lgtm. +1

        Minor comments:
        1. Would it be good to add the following in testTaskSpecificLogOptionsWithCommandOptions?

            String optionStr = options.getTaskSpecificOption("", "v", 0);
            assertTrue(optionStr.equals("-Xmx128m"));
        

        2. Patch might need to be rebased since TEZ-1696 got checked in.

        Show
        rajesh.balamohan Rajesh Balamohan added a comment - lgtm. +1 Minor comments: 1. Would it be good to add the following in testTaskSpecificLogOptionsWithCommandOptions? String optionStr = options.getTaskSpecificOption( "", " v", 0); assertTrue(optionStr.equals( "-Xmx128m" )); 2. Patch might need to be rebased since TEZ-1696 got checked in.
        Hide
        sseth Siddharth Seth added a comment -

        Updated patch to address the last comment + rebased.

        Thanks for the review. Committing.

        Show
        sseth Siddharth Seth added a comment - Updated patch to address the last comment + rebased. Thanks for the review. Committing.
        Hide
        sseth Siddharth Seth added a comment -

        Committed to master.

        Show
        sseth Siddharth Seth added a comment - Committed to master.
        Hide
        sseth Siddharth Seth added a comment -

        Rebased patch for branch-0.5

        Show
        sseth Siddharth Seth added a comment - Rebased patch for branch-0.5
        Hide
        sseth Siddharth Seth added a comment -

        Committed to branch-0.5

        Show
        sseth Siddharth Seth added a comment - Committed to branch-0.5
        Hide
        hitesh Hitesh Shah added a comment -

        Closing issue as 0.5.4, 0.6.1 and 0.7.0 have been released.

        Show
        hitesh Hitesh Shah added a comment - Closing issue as 0.5.4, 0.6.1 and 0.7.0 have been released.

          People

          • Assignee:
            sseth Siddharth Seth
            Reporter:
            sseth Siddharth Seth
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development