[SPARK-18436] isin causing SQL syntax error with JDBC - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.1
Fix Version/s: 2.0.3, 2.1.0
Component/s: SQL
Labels:
- jdbc
- sql
Environment:

Linux, SQL Server 2012

Target Version/s:

2.1.0

Description

When using a JDBC data source, the "isin" function generates invalid SQL syntax when called with an empty array, which causes the JDBC driver to throw an exception.
If the array is not empty, it works fine.

In the below example you can assume that SOURCE_CONNECTION, SQL_DRIVER and TABLE are all correctly defined.

scala> val filter = Array[String]()
filter: Array[String] = Array()

scala> val sortDF = spark.read.format("jdbc").options(Map("url" -> SOURCE_CONNECTION, "driver" -> SQL_DRIVER, "dbtable" -> TABLE)).load().filter($"cl_ult".isin(filter:_*))
sortDF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [ibi_bulk_id: bigint, ibi_row_id: int ... 174 more fields]

scala> sortDF.show()
16/11/14 15:35:46 ERROR Executor: Exception in task 0.0 in stage 6.0 (TID 205)
com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near ')'.
        at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:216)
        at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1515)
        at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:404)
        at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:350)
        at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696)
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715)
        at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:180)
        at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:155)
        at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:285)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:408)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:379)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Attachments

Issue Links

links to

[Github] Pull Request #15925 (windpiger)

[Github] Pull Request #15977 (jiangxb1987)

Activity

People

Assignee:: Xingbo Jiang

Reporter:: Dan

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 14/Nov/16 13:38

Updated:: 25/Nov/16 20:45

Resolved:: 25/Nov/16 20:45