Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2391

spark-opts value in workflow.xml is not parsed properly

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: trunk
    • Fix Version/s: 4.3.0
    • Component/s: None
    • Labels:

      Description

      When providing spark-opts in workflow.xml for oozie, the "--conf" property values is not parsed properly.

      Example spark opts

      <spark-opts>--conf spark.executor.extraJavaOptions=-Dconfig.url=hdfs:///my.conf -Dmy.env=${env}</spark-opts>
      

      In this case the value for property "spark.executor.extraJavaOptions" is "-Dconfig.url=hdfs:///my.conf -Dmy.env=$

      {env}", but in the oozie code here https://github.com/apache/oozie/blob/38e33811bc3f9d3b1a1a024dcab6af760734b696/sharelib/spark/src/main/java/org/apache/oozie/action/hadoop/SparkMain.java#L110 the string is just split by space, which make the "-Dmy.env=${env}

      " a separate argument to spark and spark fails job as that argument is not valid.

      This doesn't work with enclosing the whole value in quotes too.

      Error from stderr:

      Error: Unrecognized option '-Dmy.env=development"'.
      Run with --help for usage help or --verbose for debug output
      Intercepting System.exit(1)
      Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [1]

      1. OOZIE-2391-003.patch
        9 kB
        Peter Cseh
      2. OOZIE-2391-002.patch
        8 kB
        Peter Cseh
      3. OOZIE-2391-001.patch
        1 kB
        Nathan Skone
      4. OOZIE-2391_addendum.patch
        0.9 kB
        Robert Kanter

        Issue Links

          Activity

          Hide
          rkanter Robert Kanter added a comment -

          I believe you can just use multiple --conf:
          <spark-opts>--conf spark.executor.extraJavaOptions=-Dconfig.url=hdfs:///my.conf --conf -Dmy.env=$

          {env}

          </spark-opts>

          Show
          rkanter Robert Kanter added a comment - I believe you can just use multiple --conf: <spark-opts>--conf spark.executor.extraJavaOptions=-Dconfig.url=hdfs:///my.conf --conf -Dmy.env=$ {env} </spark-opts>
          Hide
          nathan.skone Nathan Skone added a comment -

          Robert,

          You can indeed specify multiple --conf entries, but only the last one seems to be effective. For example, I used this spark-opts:
          <spark-opts>--conf spark.driver.extraJavaOptions=-Denv=qa --conf spark.driver.extraJavaOptions=-XX:MaxPermSize=256M</spark-opts>

          In this case, both arguments were passed to Spark as follows:
          --conf
          spark.driver.extraJavaOptions=-Denv=qa
          --conf
          spark.driver.extraJavaOptions=-XX:MaxPermSize=256M

          However, Spark only used the second spark.driver.extraJavaOptions, ignoring the first.

          Thanks,
          Nathan

          Show
          nathan.skone Nathan Skone added a comment - Robert, You can indeed specify multiple --conf entries, but only the last one seems to be effective. For example, I used this spark-opts: <spark-opts>--conf spark.driver.extraJavaOptions=-Denv=qa --conf spark.driver.extraJavaOptions=-XX:MaxPermSize=256M</spark-opts> In this case, both arguments were passed to Spark as follows: --conf spark.driver.extraJavaOptions=-Denv=qa --conf spark.driver.extraJavaOptions=-XX:MaxPermSize=256M However, Spark only used the second spark.driver.extraJavaOptions, ignoring the first. Thanks, Nathan
          Hide
          nathan.skone Nathan Skone added a comment -

          diff --git sharelib/spark/src/main/java/org/apache/oozie/action/hadoop/SparkMain.java sharelib/spark/src/main/java/org/apache/oozie/action/hadoop/SparkMain.java
          index 5624951..a9abb7f 100644
          — sharelib/spark/src/main/java/org/apache/oozie/action/hadoop/SparkMain.java
          +++ sharelib/spark/src/main/java/org/apache/oozie/action/hadoop/SparkMain.java
          @@ -19,6 +19,7 @@
          package org.apache.oozie.action.hadoop;

          import org.apache.commons.lang.StringUtils;
          +import org.apache.commons.lang.text.StrTokenizer;
          import org.apache.hadoop.conf.Configuration;
          import org.apache.spark.deploy.SparkSubmit;

          @@ -107,7 +108,8 @@ public class SparkMain extends LauncherMain {
          boolean addedJars = false;
          String sparkOpts = actionConf.get(SparkActionExecutor.SPARK_OPTS);
          if (StringUtils.isNotEmpty(sparkOpts)) {

          • String[] sparkOptions = sparkOpts.split(DELIM);
            + StrTokenizer tokenizer = new StrTokenizer(sparkOpts, DELIM);
            + String[] sparkOptions = tokenizer.getTokenArray();
            for (int i = 0; i < sparkOptions.length; i++) {
            String opt = sparkOptions[i];
            if (sparkJars != null) {
          Show
          nathan.skone Nathan Skone added a comment - diff --git sharelib/spark/src/main/java/org/apache/oozie/action/hadoop/SparkMain.java sharelib/spark/src/main/java/org/apache/oozie/action/hadoop/SparkMain.java index 5624951..a9abb7f 100644 — sharelib/spark/src/main/java/org/apache/oozie/action/hadoop/SparkMain.java +++ sharelib/spark/src/main/java/org/apache/oozie/action/hadoop/SparkMain.java @@ -19,6 +19,7 @@ package org.apache.oozie.action.hadoop; import org.apache.commons.lang.StringUtils; +import org.apache.commons.lang.text.StrTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.spark.deploy.SparkSubmit; @@ -107,7 +108,8 @@ public class SparkMain extends LauncherMain { boolean addedJars = false; String sparkOpts = actionConf.get(SparkActionExecutor.SPARK_OPTS); if (StringUtils.isNotEmpty(sparkOpts)) { String[] sparkOptions = sparkOpts.split(DELIM); + StrTokenizer tokenizer = new StrTokenizer(sparkOpts, DELIM); + String[] sparkOptions = tokenizer.getTokenArray(); for (int i = 0; i < sparkOptions.length; i++) { String opt = sparkOptions [i] ; if (sparkJars != null) {
          Hide
          rkanter Robert Kanter added a comment -

          Oh, good point.

          Show
          rkanter Robert Kanter added a comment - Oh, good point.
          Hide
          hadoopqa Hadoop QA added a comment -

          Testing JIRA OOZIE-2391

          Cleaning local git workspace

          ----------------------------

          +1 PATCH_APPLIES
          +1 CLEAN
          -1 RAW_PATCH_ANALYSIS
          . +1 the patch does not introduce any @author tags
          . +1 the patch does not introduce any tabs
          . +1 the patch does not introduce any trailing spaces
          . +1 the patch does not introduce any line longer than 132
          . -1 the patch does not add/modify any testcase
          +1 RAT
          . +1 the patch does not seem to introduce new RAT warnings
          +1 JAVADOC
          . +1 the patch does not seem to introduce new Javadoc warnings
          +1 COMPILE
          . +1 HEAD compiles
          . +1 patch compiles
          . +1 the patch does not seem to introduce new javac warnings
          +1 BACKWARDS_COMPATIBILITY
          . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
          . +1 the patch does not modify JPA files
          -1 TESTS
          . Tests run: 1761
          . Tests failed: 7
          . Tests errors: 1

          . The patch failed the following testcases:

          . testSamplers(org.apache.oozie.util.TestMetricsInstrumentation)
          . testIDGeneration(org.apache.oozie.service.TestZKUUIDService)
          . testMultipleIDGeneration(org.apache.oozie.service.TestZKUUIDService)
          . testActionKillCommandActionNumbers(org.apache.oozie.command.coord.TestCoordActionsKillXCommand)
          . testMemoryUsageAndSpeed(org.apache.oozie.service.TestPartitionDependencyManagerService)
          . testBundlePauseExtendMaterializesCoordinator(org.apache.oozie.command.bundle.TestBundleChangeXCommand)
          . testForNoDuplicates(org.apache.oozie.event.TestEventGeneration)

          +1 DISTRO
          . +1 distro tarball builds with the patch

          ----------------------------
          -1 Overall result, please check the reported -1(s)

          The full output of the test-patch run is available at

          . https://builds.apache.org/job/oozie-trunk-precommit-build/2757/

          Show
          hadoopqa Hadoop QA added a comment - Testing JIRA OOZIE-2391 Cleaning local git workspace ---------------------------- +1 PATCH_APPLIES +1 CLEAN -1 RAW_PATCH_ANALYSIS . +1 the patch does not introduce any @author tags . +1 the patch does not introduce any tabs . +1 the patch does not introduce any trailing spaces . +1 the patch does not introduce any line longer than 132 . -1 the patch does not add/modify any testcase +1 RAT . +1 the patch does not seem to introduce new RAT warnings +1 JAVADOC . +1 the patch does not seem to introduce new Javadoc warnings +1 COMPILE . +1 HEAD compiles . +1 patch compiles . +1 the patch does not seem to introduce new javac warnings +1 BACKWARDS_COMPATIBILITY . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations . +1 the patch does not modify JPA files -1 TESTS . Tests run: 1761 . Tests failed: 7 . Tests errors: 1 . The patch failed the following testcases: . testSamplers(org.apache.oozie.util.TestMetricsInstrumentation) . testIDGeneration(org.apache.oozie.service.TestZKUUIDService) . testMultipleIDGeneration(org.apache.oozie.service.TestZKUUIDService) . testActionKillCommandActionNumbers(org.apache.oozie.command.coord.TestCoordActionsKillXCommand) . testMemoryUsageAndSpeed(org.apache.oozie.service.TestPartitionDependencyManagerService) . testBundlePauseExtendMaterializesCoordinator(org.apache.oozie.command.bundle.TestBundleChangeXCommand) . testForNoDuplicates(org.apache.oozie.event.TestEventGeneration) +1 DISTRO . +1 distro tarball builds with the patch ---------------------------- -1 Overall result, please check the reported -1(s) The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/2757/
          Hide
          rkanter Robert Kanter added a comment -

          The approach seems reasonable. I assume the user would have to quote any values with spaces, right?
          Can you add some unit tests for the parsing?

          Show
          rkanter Robert Kanter added a comment - The approach seems reasonable. I assume the user would have to quote any values with spaces, right? Can you add some unit tests for the parsing?
          Hide
          nathan.skone Nathan Skone added a comment -

          Robert,

          Yes, I do believe the user would need to quote values with spaces. I will work on adding some unit tests.

          Show
          nathan.skone Nathan Skone added a comment - Robert, Yes, I do believe the user would need to quote values with spaces. I will work on adding some unit tests.
          Hide
          rkanter Robert Kanter added a comment -

          I think it would also be a good idea to mention this in the Spark Action docs, especially if it requires quotes.

          Show
          rkanter Robert Kanter added a comment - I think it would also be a good idea to mention this in the Spark Action docs, especially if it requires quotes.
          Hide
          gezapeti Peter Cseh added a comment -

          I've modified the test and the first patch haven't solved the issue in my opinion.
          The -002 patch fixes the issue and adds a test case too.

          Show
          gezapeti Peter Cseh added a comment - I've modified the test and the first patch haven't solved the issue in my opinion. The -002 patch fixes the issue and adds a test case too.
          Hide
          hadoopqa Hadoop QA added a comment -

          Testing JIRA OOZIE-2391

          Cleaning local git workspace

          ----------------------------

          +1 PATCH_APPLIES
          +1 CLEAN
          +1 RAW_PATCH_ANALYSIS
          . +1 the patch does not introduce any @author tags
          . +1 the patch does not introduce any tabs
          . +1 the patch does not introduce any trailing spaces
          . +1 the patch does not introduce any line longer than 132
          . +1 the patch does adds/modifies 2 testcase(s)
          +1 RAT
          . +1 the patch does not seem to introduce new RAT warnings
          +1 JAVADOC
          . +1 the patch does not seem to introduce new Javadoc warnings
          +1 COMPILE
          . +1 HEAD compiles
          . +1 patch compiles
          . +1 the patch does not seem to introduce new javac warnings
          +1 BACKWARDS_COMPATIBILITY
          . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
          . +1 the patch does not modify JPA files
          +1 TESTS
          . Tests run: 1775
          +1 DISTRO
          . +1 distro tarball builds with the patch

          ----------------------------
          +1 Overall result, good!, no -1s

          The full output of the test-patch run is available at

          . https://builds.apache.org/job/oozie-trunk-precommit-build/2818/

          Show
          hadoopqa Hadoop QA added a comment - Testing JIRA OOZIE-2391 Cleaning local git workspace ---------------------------- +1 PATCH_APPLIES +1 CLEAN +1 RAW_PATCH_ANALYSIS . +1 the patch does not introduce any @author tags . +1 the patch does not introduce any tabs . +1 the patch does not introduce any trailing spaces . +1 the patch does not introduce any line longer than 132 . +1 the patch does adds/modifies 2 testcase(s) +1 RAT . +1 the patch does not seem to introduce new RAT warnings +1 JAVADOC . +1 the patch does not seem to introduce new Javadoc warnings +1 COMPILE . +1 HEAD compiles . +1 patch compiles . +1 the patch does not seem to introduce new javac warnings +1 BACKWARDS_COMPATIBILITY . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations . +1 the patch does not modify JPA files +1 TESTS . Tests run: 1775 +1 DISTRO . +1 distro tarball builds with the patch ---------------------------- +1 Overall result, good!, no -1s The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/2818/
          Hide
          fdenes Ferenc Denes added a comment -

          +1 (nonbinding). Good catch on the jars parsing error as well.

          Show
          fdenes Ferenc Denes added a comment - +1 (nonbinding). Good catch on the jars parsing error as well.
          Hide
          rkanter Robert Kanter added a comment -

          Overall looks good. Here's a few minor things:

          • The Javadoc for SparkOptionsSplitter has a typo: oprtions should be options
          • Instead of making SparkOptionsSplitter it's own class, I think we should just make SparkOptionsSplitter#split a static method in SparkMain.
          • Can you update the Spark Action's doc page to mention how to specify spark-opts with quotes etc? And an example. The file to edit is named DG_SparkActionExtension.twiki.
          Show
          rkanter Robert Kanter added a comment - Overall looks good. Here's a few minor things: The Javadoc for SparkOptionsSplitter has a typo: oprtions should be options Instead of making SparkOptionsSplitter it's own class, I think we should just make SparkOptionsSplitter#split a static method in SparkMain . Can you update the Spark Action's doc page to mention how to specify spark-opts with quotes etc? And an example. The file to edit is named DG_SparkActionExtension.twiki .
          Hide
          gezapeti Peter Cseh added a comment -

          Thanks for the review!
          Fixed patch attached.

          Show
          gezapeti Peter Cseh added a comment - Thanks for the review! Fixed patch attached.
          Hide
          hadoopqa Hadoop QA added a comment -

          Testing JIRA OOZIE-2391

          Cleaning local git workspace

          ----------------------------

          +1 PATCH_APPLIES
          +1 CLEAN
          +1 RAW_PATCH_ANALYSIS
          . +1 the patch does not introduce any @author tags
          . +1 the patch does not introduce any tabs
          . +1 the patch does not introduce any trailing spaces
          . +1 the patch does not introduce any line longer than 132
          . +1 the patch does adds/modifies 2 testcase(s)
          +1 RAT
          . +1 the patch does not seem to introduce new RAT warnings
          +1 JAVADOC
          . +1 the patch does not seem to introduce new Javadoc warnings
          +1 COMPILE
          . +1 HEAD compiles
          . +1 patch compiles
          . +1 the patch does not seem to introduce new javac warnings
          +1 BACKWARDS_COMPATIBILITY
          . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
          . +1 the patch does not modify JPA files
          -1 TESTS
          . Tests run: 1776
          . Tests failed: 5
          . Tests errors: 2

          . The patch failed the following testcases:

          . testMain(org.apache.oozie.action.hadoop.TestHiveMain)
          . testPigScript(org.apache.oozie.action.hadoop.TestPigMain)
          . testPig_withNullExternalID(org.apache.oozie.action.hadoop.TestPigMain)
          . testEmbeddedPigWithinPython(org.apache.oozie.action.hadoop.TestPigMain)
          . testPigScript(org.apache.oozie.action.hadoop.TestPigMainWithOldAPI)

          +1 DISTRO
          . +1 distro tarball builds with the patch

          ----------------------------
          -1 Overall result, please check the reported -1(s)

          The full output of the test-patch run is available at

          . https://builds.apache.org/job/oozie-trunk-precommit-build/2831/

          Show
          hadoopqa Hadoop QA added a comment - Testing JIRA OOZIE-2391 Cleaning local git workspace ---------------------------- +1 PATCH_APPLIES +1 CLEAN +1 RAW_PATCH_ANALYSIS . +1 the patch does not introduce any @author tags . +1 the patch does not introduce any tabs . +1 the patch does not introduce any trailing spaces . +1 the patch does not introduce any line longer than 132 . +1 the patch does adds/modifies 2 testcase(s) +1 RAT . +1 the patch does not seem to introduce new RAT warnings +1 JAVADOC . +1 the patch does not seem to introduce new Javadoc warnings +1 COMPILE . +1 HEAD compiles . +1 patch compiles . +1 the patch does not seem to introduce new javac warnings +1 BACKWARDS_COMPATIBILITY . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations . +1 the patch does not modify JPA files -1 TESTS . Tests run: 1776 . Tests failed: 5 . Tests errors: 2 . The patch failed the following testcases: . testMain(org.apache.oozie.action.hadoop.TestHiveMain) . testPigScript(org.apache.oozie.action.hadoop.TestPigMain) . testPig_withNullExternalID(org.apache.oozie.action.hadoop.TestPigMain) . testEmbeddedPigWithinPython(org.apache.oozie.action.hadoop.TestPigMain) . testPigScript(org.apache.oozie.action.hadoop.TestPigMainWithOldAPI) +1 DISTRO . +1 distro tarball builds with the patch ---------------------------- -1 Overall result, please check the reported -1(s) The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/2831/
          Hide
          rkanter Robert Kanter added a comment -

          +1

          Show
          rkanter Robert Kanter added a comment - +1
          Hide
          rkanter Robert Kanter added a comment -

          Thanks Peter. Committed to master!

          Show
          rkanter Robert Kanter added a comment - Thanks Peter. Committed to master!
          Hide
          rkanter Robert Kanter added a comment -

          Satish Subhashrao Saley found in OOZIE-2514 that the header in one of the new test files is wrong and shows up as a checkstyle violation. It's a super trivial fix, so I'll just take care of it as an addendum patch here.

          Show
          rkanter Robert Kanter added a comment - Satish Subhashrao Saley found in OOZIE-2514 that the header in one of the new test files is wrong and shows up as a checkstyle violation. It's a super trivial fix, so I'll just take care of it as an addendum patch here.
          Hide
          rkanter Robert Kanter added a comment -

          Fixed

          Show
          rkanter Robert Kanter added a comment - Fixed
          Hide
          rkanter Robert Kanter added a comment -

          Closing issue; Oozie 4.3.0 is released.

          Show
          rkanter Robert Kanter added a comment - Closing issue; Oozie 4.3.0 is released.

            People

            • Assignee:
              gezapeti Peter Cseh
              Reporter:
              pbathala Praveen Bathala
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development