Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.8.0
    • Component/s: container
    • Labels:
      None

      Description

      task.opts is a bit tricky to use right now.

      The most common thing that I see people want to do are:

      • Change the Xmx setting for their Samza containers.
      • Turn on a remote debugger.
      • Change garbage collector settings.

      Most of the time they do this in dev-mode (using LocalJobFactory).

      Changing task.opts is tricky for a number of reasons.

      1. Changing task.opts at all eliminates all of the defaults. This means, if you set -Xmx, you lose all the defaults set in run-class.sh when no JVM_OPTS is set (log settings, Java system properties, etc).
      2. LocalJobFactory uses ThreadJob by default, which doesn't even pay attention to task.opts (since it starts the container as a thread, not a new process).
      3. If you manage to figure out how to switch LocalJobFactory to use ProcessJob (undocumented), you end up with the SamzaContainer running as a separate process with no way to kill it (the LocalJobRunner starts the ProcessJob, then exits).

      1. samza_109_1_2.patch
        28 kB
        Chinmay Soman
      2. samza_109_1_1.patch
        28 kB
        Chinmay Soman
      3. samza_109_1.patch
        17 kB
        Chinmay Soman

        Issue Links

          Activity

          Hide
          criccomini Chris Riccomini added a comment -

          Ok. Will remove the first point.

          Cool, thanks.

          BTW, what is the format of using the task.opts? I do not see any documents mentioning it. Want to add an example into the table.

          It would be good to have an example, if one isn't shown in the config table. Anything you put in task.opts gets forwarded directly to the commandline as part of the JVM invocation. So, the format is usually a bunch of JVM switches. Things like:

          task.opts=-Xmx2048 -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
          

          Note that you can't use bash $FOO environment variables inside. They're escaped for security.

          Show
          criccomini Chris Riccomini added a comment - Ok. Will remove the first point. Cool, thanks. BTW, what is the format of using the task.opts? I do not see any documents mentioning it. Want to add an example into the table. It would be good to have an example, if one isn't shown in the config table. Anything you put in task.opts gets forwarded directly to the commandline as part of the JVM invocation. So, the format is usually a bunch of JVM switches. Things like: task.opts=-Xmx2048 -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC Note that you can't use bash $FOO environment variables inside. They're escaped for security.
          Hide
          closeuris Yan Fang added a comment -

          Ok. Will remove the first point.

          BTW, what is the format of using the task.opts? I do not see any documents mentioning it. Want to add an example into the table.

          Show
          closeuris Yan Fang added a comment - Ok. Will remove the first point. BTW, what is the format of using the task.opts? I do not see any documents mentioning it. Want to add an example into the table.
          Hide
          criccomini Chris Riccomini added a comment -

          Oops, you're right.

          This one is fixed:

          *If you set this property, the log configuration is disrupted. Please see SAMZA-109 for a workaround.
          

          The ThreadJobFactory remains unfixed, however we do WARN on it in the logs now.

          Show
          criccomini Chris Riccomini added a comment - Oops, you're right. This one is fixed: *If you set this property, the log configuration is disrupted. Please see SAMZA-109 for a workaround. The ThreadJobFactory remains unfixed, however we do WARN on it in the logs now.
          Hide
          closeuris Yan Fang added a comment -

          I saw

          Any JVM options to include in the command line when executing Samza containers. For example, this can be used to set the JVM heap size, to tune the garbage collector, or to enable remote debugging. Note there are some issues with the current implementation of task.opts:

          *If you set this property, the log configuration is disrupted. Please see SAMZA-109 for a workaround.
          *This cannot be used when running with ThreadJobFactory

          in the task.opts in the configuration table, I can not recall if we fix these two problems in this ticket?

          Show
          closeuris Yan Fang added a comment - I saw Any JVM options to include in the command line when executing Samza containers. For example, this can be used to set the JVM heap size, to tune the garbage collector, or to enable remote debugging. Note there are some issues with the current implementation of task.opts: *If you set this property, the log configuration is disrupted. Please see SAMZA-109 for a workaround. *This cannot be used when running with ThreadJobFactory in the task.opts in the configuration table, I can not recall if we fix these two problems in this ticket?
          Hide
          criccomini Chris Riccomini added a comment -

          +1 Merged and committed to 0.8.

          Show
          criccomini Chris Riccomini added a comment - +1 Merged and committed to 0.8.
          Hide
          cpsoman Chinmay Soman added a comment -

          Inside, ProcessJobFactory,

          The commandBuilder was missing this line:
          .setTaskNameToChangeLogPartitionMapping(taskNameToChangeLogPartitionMapping.map(kv => kv._1 -> Integer.valueOf(kv._2)).asJava)

          This was causing a NPE

          Show
          cpsoman Chinmay Soman added a comment - Inside, ProcessJobFactory, The commandBuilder was missing this line: .setTaskNameToChangeLogPartitionMapping(taskNameToChangeLogPartitionMapping.map(kv => kv._1 -> Integer.valueOf(kv._2)).asJava) This was causing a NPE
          Hide
          criccomini Chris Riccomini added a comment -

          What was the bug you fixed? Just curious. The diff in RB makes it hard to figure out.

          Show
          criccomini Chris Riccomini added a comment - What was the bug you fixed? Just curious. The diff in RB makes it hard to figure out.
          Hide
          cpsoman Chinmay Soman added a comment -

          New patch with a bug fix (introduced by SAMZA-123).

          Show
          cpsoman Chinmay Soman added a comment - New patch with a bug fix (introduced by SAMZA-123 ).
          Hide
          criccomini Chris Riccomini added a comment -

          Ah, good, yea, it's SAMZA-123 related.

          Show
          criccomini Chris Riccomini added a comment - Ah, good, yea, it's SAMZA-123 related.
          Hide
          cpsoman Chinmay Soman added a comment -

          And no - its not build related.

          Show
          cpsoman Chinmay Soman added a comment - And no - its not build related.
          Hide
          cpsoman Chinmay Soman added a comment -

          It seems to be in ShellCommandBuilder.scala:91

          var taskNameToChangeLogPartitionMappingString = ShellCommandBuilder.serializeTaskNameToChangeLogPartitionMapping(taskNameToChangeLogPartitionMapping.map(kv => kv._1 -> kv._2.toInt).toMap)

          Interestingly, this happened only when running via ProcessJobFactory. It runs fine when I use YarnJobFactory. Seems weird - since both of them use ShellCommandBuilder.

          Show
          cpsoman Chinmay Soman added a comment - It seems to be in ShellCommandBuilder.scala:91 var taskNameToChangeLogPartitionMappingString = ShellCommandBuilder.serializeTaskNameToChangeLogPartitionMapping(taskNameToChangeLogPartitionMapping.map(kv => kv._1 -> kv._2.toInt).toMap) Interestingly, this happened only when running via ProcessJobFactory. It runs fine when I use YarnJobFactory. Seems weird - since both of them use ShellCommandBuilder.
          Hide
          criccomini Chris Riccomini added a comment -

          Hmm. What's the NPE? Is it build-related? I've been messing with build stuff lately.

          Show
          criccomini Chris Riccomini added a comment - Hmm. What's the NPE? Is it build-related? I've been messing with build stuff lately.
          Hide
          cpsoman Chinmay Soman added a comment -

          I'm getting a NPE after doing rebase (and re-running my manual tests)

          It seems I'll have to test this out a bit more.

          Show
          cpsoman Chinmay Soman added a comment - I'm getting a NPE after doing rebase (and re-running my manual tests) It seems I'll have to test this out a bit more.
          Hide
          cpsoman Chinmay Soman added a comment -

          Patch with the addressed review comments.

          Show
          cpsoman Chinmay Soman added a comment - Patch with the addressed review comments.
          Hide
          criccomini Chris Riccomini added a comment -

          Yea, I wanted to raise SAMZA-259 on the mailing list. I pulled it into the 0.8.0 release.

          My thought is this: let's continue updating 0.7.0's docs for now. I want to update our scripts in SAMZA-259 to do per-branch javadoc releases. That is, we'll have the script generage javadocs for 0.7.0 by checking out the 0.7.0 branch. We'll generate "current" by checking out the master branch, etc. So, I think it's safe to just update 0.7.0, and tackle SAMZA-259 to separate things out (probably by removing the version number from the docs tree, and updating the scripts). That's how rust handles it, and I like that style, unless someone has another suggestion:

          http://static.rust-lang.org/doc/master/tutorial.html
          http://static.rust-lang.org/doc/0.10/tutorial.html

          Show
          criccomini Chris Riccomini added a comment - Yea, I wanted to raise SAMZA-259 on the mailing list. I pulled it into the 0.8.0 release. My thought is this: let's continue updating 0.7.0's docs for now. I want to update our scripts in SAMZA-259 to do per-branch javadoc releases. That is, we'll have the script generage javadocs for 0.7.0 by checking out the 0.7.0 branch. We'll generate "current" by checking out the master branch, etc. So, I think it's safe to just update 0.7.0, and tackle SAMZA-259 to separate things out (probably by removing the version number from the docs tree, and updating the scripts). That's how rust handles it, and I like that style, unless someone has another suggestion: http://static.rust-lang.org/doc/master/tutorial.html http://static.rust-lang.org/doc/0.10/tutorial.html
          Hide
          closeuris Yan Fang added a comment -

          hmm, yes, that's a good question. It's related to SAMZA-259. Maybe it's the time to finish that ticket at the same time. Also, I realized in SAMZA-123, Jakob already updated the docs in 0.7.0, which actually does not have the changes. Not sure if we want to fully copy into 0.8.0. What do you think, Chris Riccomini ?

          Show
          closeuris Yan Fang added a comment - hmm, yes, that's a good question. It's related to SAMZA-259 . Maybe it's the time to finish that ticket at the same time. Also, I realized in SAMZA-123 , Jakob already updated the docs in 0.7.0, which actually does not have the changes. Not sure if we want to fully copy into 0.8.0. What do you think, Chris Riccomini ?
          Hide
          cpsoman Chinmay Soman added a comment -

          Thanks Yan ! Almost forgot about that.

          But I do have a question - the documentation seems to explain the 0.7.0 behaviour. Hence, I wasn't sure if this is the right place for these new features to be documented ?

          Or should be make a full copy into 0.8.0 and edit it there?

          Show
          cpsoman Chinmay Soman added a comment - Thanks Yan ! Almost forgot about that. But I do have a question - the documentation seems to explain the 0.7.0 behaviour. Hence, I wasn't sure if this is the right place for these new features to be documented ? Or should be make a full copy into 0.8.0 and edit it there?
          Hide
          closeuris Yan Fang added a comment -

          Besides Chris' comments in RB, just a reminder: could you also update relevant docs - there are two AFAIK - (1) learn/documentation/0.7.0/jobs/job-runner.html (2) /learn/documentation/0.7.0/jobs/configuration-table.html . Thank you.

          Show
          closeuris Yan Fang added a comment - Besides Chris' comments in RB, just a reminder: could you also update relevant docs - there are two AFAIK - (1) learn/documentation/0.7.0/jobs/job-runner.html (2) /learn/documentation/0.7.0/jobs/configuration-table.html . Thank you.
          Show
          cpsoman Chinmay Soman added a comment - https://reviews.apache.org/r/24011/
          Hide
          cpsoman Chinmay Soman added a comment -

          Patch for (part 1) available

          Show
          cpsoman Chinmay Soman added a comment - Patch for (part 1) available
          Hide
          cpsoman Chinmay Soman added a comment -

          Patch for (part 1)

          Show
          cpsoman Chinmay Soman added a comment - Patch for (part 1)
          Hide
          criccomini Chris Riccomini added a comment -

          I think it would also be worth trying to preserve the GC logs when you set task.opts

          I went back and forth on this. The trick is figuring out how to handle this in cases where the dev actually wants to customize the GC log settings in task.opts.

          Show
          criccomini Chris Riccomini added a comment - I think it would also be worth trying to preserve the GC logs when you set task.opts I went back and forth on this. The trick is figuring out how to handle this in cases where the dev actually wants to customize the GC log settings in task.opts.
          Hide
          martinkl Martin Kleppmann added a comment -

          +1 on the plan. I think it would also be worth trying to preserve the GC logs when you set task.opts.

          Show
          martinkl Martin Kleppmann added a comment - +1 on the plan. I think it would also be worth trying to preserve the GC logs when you set task.opts.
          Hide
          criccomini Chris Riccomini added a comment -

          I would like to re-animate this ticket.

          I took a brief look last week at this. I feel that this work should actually be broken into two JIRAs:

          1. Step (1, 2, 3, 5) in the original list (eliminate LocalJobFactory, set log4j, the system properties, -server, etc)
          2. Step (4) in the original list (have jobs log their container ID, and make kill-job.sh use it)

          I also think we should scrap my original idea bout doing string replacement in environment variables (6, 7, 8). But it might be worth opening another JIRA just to have the discussion.

          The thing that I'm most interested is actually just steps 1, 2, 3, and 5, so I'd propose moving step 4 into another JIRA, as well.

          Show
          criccomini Chris Riccomini added a comment - I would like to re-animate this ticket. I took a brief look last week at this. I feel that this work should actually be broken into two JIRAs: Step (1, 2, 3, 5) in the original list (eliminate LocalJobFactory, set log4j, the system properties, -server, etc) Step (4) in the original list (have jobs log their container ID, and make kill-job.sh use it) I also think we should scrap my original idea bout doing string replacement in environment variables (6, 7, 8). But it might be worth opening another JIRA just to have the discussion. The thing that I'm most interested is actually just steps 1, 2, 3, and 5, so I'd propose moving step 4 into another JIRA, as well.
          Hide
          criccomini Chris Riccomini added a comment - - edited

          A work around until this is fixed is to set something like:

          task.opts=-Xmx6g -XX:+PrintGCDateStamps -Xloggc:logs/gc.log -Dsamza.log.dir=logs -Dsamza.container.name=geo-waterloo -agentlib:jdwp=transport=dt_socket,address=localhost:9009,server=y,suspend
          

          This example sets a custom memory size, and enables local debugging, while not messing with the logs.

          If running in local-mode (LocalJobFactory), you'll also need to set ShellCommandBuilder in order to switch from ThreadJob to ProcessJob.

          task.command.class=org.apache.samza.job.ShellCommandBuilder
          
          Show
          criccomini Chris Riccomini added a comment - - edited A work around until this is fixed is to set something like: task.opts=-Xmx6g -XX:+PrintGCDateStamps -Xloggc:logs/gc.log -Dsamza.log.dir=logs -Dsamza.container.name=geo-waterloo -agentlib:jdwp=transport=dt_socket,address=localhost:9009,server=y,suspend This example sets a custom memory size, and enables local debugging, while not messing with the logs. If running in local-mode (LocalJobFactory), you'll also need to set ShellCommandBuilder in order to switch from ThreadJob to ProcessJob. task.command.class=org.apache.samza.job.ShellCommandBuilder
          Hide
          criccomini Chris Riccomini added a comment -

          We should also consider defaulting with -server and -d64.

          Show
          criccomini Chris Riccomini added a comment - We should also consider defaulting with -server and -d64.
          Hide
          criccomini Chris Riccomini added a comment -

          Proposed changes:

          1. Eliminate LocalJobFactory.
          2. Create ProcessJobFactory.
          3. Create ThreadJobFactory.
          4. Make ProcessJobFactory behave like YarnJobFactory.
          5. Forcibly assign the log4j.xml file (if it exists), and the samza.log.dir/samza.container.name Java system properties.
          6. Add CommandBuilder.setLogDir.
          7. Make ShellCommandBuilder.buildEnvironment do a string replace for task.opts (ENV_JAVA_OPTS) to replace $SAMZA_LOG_DIR with log dir, and $SAMZA_CONTAINER_NAME with container name.
          8. Update SamzaAppMasterTaskManager, and ProcessJobFactory to use the new CommandBuilder.setLogDir method.

          (3) is useful for debugging via IDE, so we should keep it.

          (4) means that ProcessJob and YarnJob should both save a job ID file somewhere (configurable). The ProcessJob's job ID file will contain the PID of the SamzaContainer. The YarnJob's job ID file will contain the application attempt ID for the Samza job in the YARN cluster. We should update kill-job.sh to take a job ID, but to default to a standard location (e.g. bin/../samza-job-name.pid). The YarnJob will use the job ID to kill the job in the YARN cluster. The ProcessJob will use the PID to kill the job locally. We'll need to update the StreamJob interface to support this.

          (5) would make run-class.sh replace:

          if [ -z "$JAVA_OPTS" ]; then
            JAVA_OPTS="-Xmx768M -XX:+PrintGCDateStamps -Xloggc:$SAMZA_LOG_DIR/gc.log -Dsamza.log.dir=$SAMZA_LOG_DIR -Dsamza.container.name=$SAMZA_CONTAINER_NAME"
            if [ -f $base_dir/lib/log4j.xml ]; then
              JAVA_OPTS="$JAVA_OPTS -Dlog4j.configuration=file:$base_dir/lib/log4j.xml"
            fi
          fi
          

          with:

          if [ -z "$JAVA_OPTS" ]; then
            JAVA_OPTS="-Xmx768M -XX:+PrintGCDateStamps -Xloggc:$SAMZA_LOG_DIR/gc.log"
          fi
          
          if [ -f $base_dir/lib/log4j.xml ]; then
            JAVA_OPTS="$JAVA_OPTS -Dlog4j.configuration=file:$base_dir/lib/log4j.xml"
          fi
          
          JAVA_OPTS="$JAVA_OPTS -Dsamza.log.dir=$SAMZA_LOG_DIR -Dsamza.container.name=$SAMZA_CONTAINER_NAME"
          

          These changes address the problems in the following way:

          1. Changing task.opts will no longer disable logging. Log4j.xml will still be applied, and the log dir and container name will both be available as system properties for log4j.xml if you're using the daily rolling appender. If you wish to use refer to the container name or log path, you can just use the environment variables, and they will get string-replaced as expected. Changing task.opts WOULD disable gc.log files, but gc.log could be re-enabled using the environment variable due to change (7) above.
          2. There is now a clear distinction between ProcessJob and ThreadJob, since they each have their own factory now. No one should really ever use ThreadJob, except for unit tests, and IDE debugging.
          3. You now have a way to kill jobs in dev-mode using the updated kill-job.sh script.

          Show
          criccomini Chris Riccomini added a comment - Proposed changes: 1. Eliminate LocalJobFactory. 2. Create ProcessJobFactory. 3. Create ThreadJobFactory. 4. Make ProcessJobFactory behave like YarnJobFactory. 5. Forcibly assign the log4j.xml file (if it exists), and the samza.log.dir/samza.container.name Java system properties. 6. Add CommandBuilder.setLogDir. 7. Make ShellCommandBuilder.buildEnvironment do a string replace for task.opts (ENV_JAVA_OPTS) to replace $SAMZA_LOG_DIR with log dir, and $SAMZA_CONTAINER_NAME with container name. 8. Update SamzaAppMasterTaskManager, and ProcessJobFactory to use the new CommandBuilder.setLogDir method. (3) is useful for debugging via IDE, so we should keep it. (4) means that ProcessJob and YarnJob should both save a job ID file somewhere (configurable). The ProcessJob's job ID file will contain the PID of the SamzaContainer. The YarnJob's job ID file will contain the application attempt ID for the Samza job in the YARN cluster. We should update kill-job.sh to take a job ID, but to default to a standard location (e.g. bin/../samza-job-name.pid). The YarnJob will use the job ID to kill the job in the YARN cluster. The ProcessJob will use the PID to kill the job locally. We'll need to update the StreamJob interface to support this. (5) would make run-class.sh replace: if [ -z "$JAVA_OPTS" ]; then JAVA_OPTS= "-Xmx768M -XX:+PrintGCDateStamps -Xloggc:$SAMZA_LOG_DIR/gc.log -Dsamza.log.dir=$SAMZA_LOG_DIR -Dsamza.container.name=$SAMZA_CONTAINER_NAME" if [ -f $base_dir/lib/log4j.xml ]; then JAVA_OPTS= "$JAVA_OPTS -Dlog4j.configuration=file:$base_dir/lib/log4j.xml" fi fi with: if [ -z "$JAVA_OPTS" ]; then JAVA_OPTS= "-Xmx768M -XX:+PrintGCDateStamps -Xloggc:$SAMZA_LOG_DIR/gc.log" fi if [ -f $base_dir/lib/log4j.xml ]; then JAVA_OPTS= "$JAVA_OPTS -Dlog4j.configuration=file:$base_dir/lib/log4j.xml" fi JAVA_OPTS= "$JAVA_OPTS -Dsamza.log.dir=$SAMZA_LOG_DIR -Dsamza.container.name=$SAMZA_CONTAINER_NAME" These changes address the problems in the following way: 1. Changing task.opts will no longer disable logging. Log4j.xml will still be applied, and the log dir and container name will both be available as system properties for log4j.xml if you're using the daily rolling appender. If you wish to use refer to the container name or log path, you can just use the environment variables, and they will get string-replaced as expected. Changing task.opts WOULD disable gc.log files, but gc.log could be re-enabled using the environment variable due to change (7) above. 2. There is now a clear distinction between ProcessJob and ThreadJob, since they each have their own factory now. No one should really ever use ThreadJob, except for unit tests, and IDE debugging. 3. You now have a way to kill jobs in dev-mode using the updated kill-job.sh script.
          Hide
          criccomini Chris Riccomini added a comment -

          To further complicate (1), the default task.opts uses some environment variables (SAMZA_LOG_DIR and SAMZA_CONTAINER_NAME) which aren't accessible from within config. This means, even if you set task.opts to something like:

          task.opts=-Xmx768M -XX:+PrintGCDateStamps -Xloggc:$SAMZA_LOG_DIR/gc.log -Dsamza.log.dir=$SAMZA_LOG_DIR -Dsamza.container.name=$SAMZA_CONTAINER_NAME
          

          This won't work, because the environment variables are escaped as part of Utils.envVarEscape.

          Show
          criccomini Chris Riccomini added a comment - To further complicate (1), the default task.opts uses some environment variables (SAMZA_LOG_DIR and SAMZA_CONTAINER_NAME) which aren't accessible from within config. This means, even if you set task.opts to something like: task.opts=-Xmx768M -XX:+PrintGCDateStamps -Xloggc:$SAMZA_LOG_DIR/gc.log -Dsamza.log.dir=$SAMZA_LOG_DIR -Dsamza.container.name=$SAMZA_CONTAINER_NAME This won't work, because the environment variables are escaped as part of Utils.envVarEscape.

            People

            • Assignee:
              cpsoman Chinmay Soman
              Reporter:
              criccomini Chris Riccomini
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development