Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-3218

Oozie Sqoop action with command splits the select clause into multiple parts due to delimiter being space

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 3.3.2, 4.1.0, 4.2.0, 4.3.0, 5.0.0
    • Fix Version/s: None
    • Component/s: action, workflow
    • Labels:
      None
    • Environment:

      Hortonworks Hadoop HDP-2.6.4.x release 

      oozie admin -version: Oozie server build version: 4.2.0.2.6.4.0-91

      Description

      When running a Oozie Sqoop action which has command with --query in place the query is split into multiple parts causing "Unrecognized argument:" and in-turn fails.

      <sqoop xmlns="uri:oozie:sqoop-action:0.4">
          <job-tracker>${resourceManager}</job-tracker>
          <name-node>${nameNode}</name-node>
          <command>import --verbose --connect jdbc:mysql://test.openstacklocal/db --query select * from abc where $CONDITIONS --username test --password test --driver com.mysql.jdbc.Driver -m 1 </command>
          <ok to="end"/>
      </sqoop>
      

       
      Oozie Launcher logs:

       Sqoop command arguments :
       import
       --verbose
       --connect
       jdbc:mysql://test.openstacklocal/db
       --query
       "select
       *
       from
       abc
       where
       $CONDITIONS"
       --username
       hive
       --password
       ********
       --driver
       com.mysql.jdbc.Driver
       -m
       1
       Fetching child yarn jobs
       tag id : oozie-a1bbe03a0983b9e822d12ae7bb269ee3
       2791 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hdp263-3.openstacklocal/172.26.105.248:8050
       Child yarn jobs are found - 
       =================================================================
      
      >>> Invoking Sqoop command line now >>>
      
      3172 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
       3172 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
       3218 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6.2.6.4.0-91
       3218 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6.2.6.4.0-91
       3287 [main] DEBUG org.apache.sqoop.tool.BaseSqoopTool - Enabled debug logging.
       3287 [main] DEBUG org.apache.sqoop.tool.BaseSqoopTool - Enabled debug logging.
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Error parsing arguments for import:
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Error parsing arguments for import:
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: *
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: *
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: from
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: from
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: abc
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: abc
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: where
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: where
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: $CONDITIONS"
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: $CONDITIONS"
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --username
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --username
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: abc
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: abc
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --password
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --password
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: abc
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: abc
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --driver
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --driver
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: com.mysql.jdbc.Driver
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: com.mysql.jdbc.Driver
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: -m
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: -m
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: 1
       3288 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: 1
       3289 [main] DEBUG org.apache.sqoop.Sqoop - 
       Try --help for usage instructions.
      

      The code piece which causes the issue is in SqoopActionExecutor.java:

              // Build a list of arguments from either a tokenized <command> string or a list of <arg>
              if (actionXml.getChild("command", ns) != null) {
                  String command = actionXml.getChild("command", ns).getTextTrim();
                  StringTokenizer st = new StringTokenizer(command, " ");
                  while (st.hasMoreTokens()) {
                      argList.add(st.nextToken());
                  }
              }
              else {
                  @SuppressWarnings("unchecked")
                  List<Element> eArgs = (List<Element>) actionXml.getChildren("arg", ns);
                  for (Element elem : eArgs) {
                      argList.add(elem.getTextTrim());
                  }
              }
              // If the command is given accidentally as "sqoop import --option"
              // instead of "import --option" we can make a user's life easier
              // by removing away the unnecessary "sqoop" token.
              // However, we do not do this if the command looks like
              // "sqoop --option", as that's entirely invalid.
              if (argList.size() > 1 &&
                      argList.get(0).equalsIgnoreCase(SQOOP) &&
                      !argList.get(1).startsWith("-")) {
                  XLog.getLog(getClass()).info(
                          "Found a redundant 'sqoop' prefixing the command. Removing it.");
                  argList.remove(0);
              }
      
              setSqoopCommand(actionConf, argList.toArray(new String[argList.size()]));
              return actionConf;
      

      Since the delimiter is a space, the code splits the select * from table into select, *, from, table as nextToken and adds them separately into the array causing the issue.

      I have made a code change locally to address this issue and did some testing around this and it seem to work fine, hence submitting the code change for this.

        Attachments

        1. OOZIE-3218.patch
          2 kB
          Mahesh Balakrishnan
        2. OOZIE-3218-2.patch
          4 kB
          Mahesh Balakrishnan
        3. OOZIE-3218-3.patch
          4 kB
          Andras Piros

          Issue Links

            Activity

              People

              • Assignee:
                mbalakrishnan Mahesh Balakrishnan
                Reporter:
                mbalakrishnan Mahesh Balakrishnan
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: