Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-2777

Sqoop 1.4.6 import failure on special SQL Server column names

Add voteWatch issue
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.4.6
    • None
    • None
    • Hortonworks Data Platform v2.3.0.0 (build 2557)
      RHEL v6.7
      Microsoft JDBC driver v4.1

    Description

      I'm using Sqoop to import various tables from an MS SQL Server instance. The majority of imports are fine but a couple of tables fail to import due to what apparently is the specific naming of some table columns. Such columns are actually named "group", "merge", "order", "from", etc.

      It looks like when executing the final Java statement, SQL Server thinks those are actual SQL keywords instead of column names, and hence tries to execute commands like "group by" or "order by". Or at least so it seams...

      Eliminating those specific columns from the Sqoop import job ends up in a successful table import. But of course, I need those columns as well and changing their name in SQL Server is not an option (production DB with thousands of users and many other live applications connected to it constantly).

      Sqoop command sample:

      sqoop import 
      --connect 'jdbc:sqlserver://SERVER\INSTANCE;database=DB;username=USR;password=PWD' 
      --driver com.microsoft.sqlserver.jdbc.SQLServerDriver 
      --table TABLE 
      --columns IdGGD,IdGGM,IdImage,LinkName,Invert,Merge,Mirror,Order,GGMXTId,GGMXTName,GGMXTP,MarkerP,MarkerW,MarkerH,MarkerX1,MarkerY1,Instruction 
      --direct 
      --outdir /some/local/directory 
      --as-textfile 
      --target-dir /some/HDFS/directory 
      --null-string NULL 
      --null-non-string NULL 
      --append
      

      The obtained error looks like this:

      2016-01-05 18:55:42,469 INFO  - [main:] ~  map 0% reduce 0% (Job:1367)
      2016-01-05 18:55:47,531 INFO  - [main:] ~ Task Id : attempt_1451990134058_0040_m_000000_0, Status : FAILED (Job:1406)
      Error: java.io.IOException: SQLException in nextKeyValue
              at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
              at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
              at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
              at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
              at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
              at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
              at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
              at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
              at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near the keyword 'Merge'.
              at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:217)
              at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1635)
              at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:426)
              at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:372)
              at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:6276)
              at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1793)
              at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:184)
              at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:159)
              at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:284)
              at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
              at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
              ... 12 more
      

      One workaround that I found is to use the Sqoop --query option instead of the --table one. But even in this case I can only import all the columns of the specific table, without being able to select specific columns.

      Sqoop comand sample:

      sqoop import 
      --connect 'jdbc:sqlserver://SERVER\INSTANCE;database=DB;username=USR;password=PWD' 
      --driver com.microsoft.sqlserver.jdbc.SQLServerDriver 
      --query 'SELECT * FROM TABLE WHERE $CONDITIONS' 
      --num-mappers 1 
      --direct 
      --outdir /some/local/directory 
      --as-textfile 
      --target-dir /some/HDFS/directory 
      --null-string NULL 
      --null-non-string NULL 
      --append
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            VladMARIN Vlad MARIN

            Dates

              Created:
              Updated:

              Issue deployment