Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-2748

Remove parentheses from WHERE clause

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.4.6
    • None
    • codegen
    • Redhat 6
      Cloudera 5.5.0

    Description

      When performing a sqoop import with option "-m 1" the following where clause is added by sqoop:
      WHERE ( 1=1 ) AND ( 1=1 )

      This where clause is redundant and creates an incompatibility with the IBM IMS Universal JDBC driver.
      The IMS JDBC Driver does not allow the use of parentheses in the where clause. The below quote was taken from 2nd dot point here: https://www-01.ibm.com/support/knowledgecenter/SSEPH2_13.1.0/com.ibm.ims13.doc.apg/ims_odbjdbcusagewhere.htm.
      "Do not use parentheses. Qualification statements are evaluated from left to right. The order of evaluation for operators is the IMS evaluation order for segment search arguments."

      I have modified the below commands and output to anonymize the details.

      Sample sqoop command:

      sqoop import --connect 'jdbc:ims://myserver:9999/MYDATABASE:currentSchema=MYSCHEMA;' -username ???????? -password ???????? --null-string '\\N' --null-non-string '\\N' --hive-import --create-hive-table --table MYIMSTABLE --hive-table MYHIVETABLE --hive-drop-import-delims --driver com.ibm.ims.jdbc.IMSDriver --delete-target-dir -m 1
      

      Extract of log produced:

      ...
      15/12/16 17:20:05 INFO mapreduce.Job:  map 0% reduce 0%
      15/12/16 17:20:16 INFO mapreduce.Job: Task Id : attempt_1449625205240_0003_m_000000_0, Status : FAILED
      Error: java.io.IOException: SQLException in nextKeyValue
              at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
              at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
              at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
              at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
              at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
              at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
              at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
              at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
              at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      Caused by: java.sql.SQLException: Error parsing the following SQL statement: SELECT FIELD1 FROM MYIMSTABLE AS MYIMSTABLE WHERE ( 1=1 ) AND ( 1=1 ). Encountered " <ONEEQUALONE> "1=1 "" at line 1, column 1167.
      Was expecting:
          <ID> ...
      
              at com.ibm.ims.jdbc.SQLForDLIParser.parse(SQLForDLIParser.java:651)
              at com.ibm.ims.jdbc.PreparedStatementImpl.parseSQLQuery(PreparedStatementImpl.java:133)
              at com.ibm.ims.jdbc.PreparedStatementImpl.<init>(PreparedStatementImpl.java:87)
              at com.ibm.ims.jdbc.ConnectionImpl.prepareStatement(ConnectionImpl.java:978)
              at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:101)
              at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
              ... 12 more
      

      In summary, for sqoop to be compatible with the IBM IMS JDBC driver, I am asking for either the where clause "( 1=1 ) AND ( 1=1 )" to be removed or for the parentheses to be removed.

      Workaround
      The only workaround I have been able to come up with is to switch to using the --query option instead of --table and hack the $CONDITIONS variable into the select part of the query - to get rid of the parentheses from the where clause.

      For example:

      sqoop import --connect 'jdbc:ims://myserver:9999/MYDATABASE:currentSchema=MYSCHEMA;' -username ???????? -password ???????? --null-string '\\N' --null-non-string '\\N' --hive-import --create-hive-table --query "select FIELD1,'\$CONDITIONS' as CONDITIONS from MYIMSTABLE" --hive-table MYHIVETABLE --hive-drop-import-delims --driver com.ibm.ims.jdbc.IMSDriver --delete-target-dir -m 1 --target-dir test
      

      Obviously this is not desirable as the commands to retrieve the metadata (WHERE 1 = 0) will retrieve the data with it. The sql will look like this:
      select FIELD1,' (1 = 0) ' as CONDITIONS from MYIMSTABLE
      however it does import the data successfully (albeit inefficiently).

      Attachments

        Activity

          People

            Unassigned Unassigned
            wchisholm Warren Chisholm-Apache
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: