[SPARK-42193] dataframe API filter criteria throwing ParseException when reading a JDBC column name with special characters - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Cannot Reproduce
Affects Version/s: 3.3.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

On Spark 3.3.0, when reading from a JDBC table(used SQLite to repro) using spark.read.jdbc command with sqlite-jdbc:3.34.0.jar on a table and column name containing special characters. Dataframe API filter criteria fails with parse Exception

[#Script:]

from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Databricks Support") \
    .config("spark.jars.packages", "org.xerial:sqlite-jdbc:3.34.0") \
    .getOrCreate()

columns = ["id", "/abc/column", "value"]
data = [(1, 'A', 100), (2, 'B', 200), (3, 'B', 300)]

rdd = spark.sparkContext.parallelize(data)
df = spark.createDataFrame(rdd).toDF(*columns)

options = {"url": "jdbc:sqlite:/<local-path>/spark-3.3.1-bin-hadoop3/jars/test.db", "dbtable": '"/abc/table"', "driver": "org.sqlite.JDBC"}

df.coalesce(1).write.format("jdbc").options(**options).mode("append").save()

df_1 = spark.read.format("jdbc") \
    .option("url", "jdbc:sqlite:/<local-path>/spark-3.3.1-bin-hadoop3/jars/test.db") \
    .option("dbtable", '"/abc/table"') \
    .option("driver", "org.sqlite.JDBC") \
    .load()

df_2 = df_1.filter("`/abc/column` = 'B'")

df_2.show()

Error:

``` Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/opt/homebrew/Cellar/apache-spark/3.3.1/libexec/python/pyspark/sql/dataframe.py", line 606, in show
  print(self._jdf.showString(n, 20, vertical))
 File "/opt/homebrew/Cellar/apache-spark/3.3.1/libexec/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
 File "/opt/homebrew/Cellar/apache-spark/3.3.1/libexec/python/pyspark/sql/utils.py", line 196, in deco
  raise converted from None
pyspark.sql.utils.ParseException: 
Syntax error at or near '/': extra input '/'(line 1, pos 0)

== SQL ==
/abc/column
^^^```

However, when using Spark 3.2.1, we are able to successfully apply dataframe.filter option

>>> df_2.show()
+---+-----------+-----+
| id|/abc/column|value|
+---+-----------+-----+
|  2|          B|  200|
|  3|          B|  300|
+---+-----------+-----+

Repro steps:

Download Spark 3.2.1 in local
Download and Copy the sqlite-jdbc:3.34.0.jar into the jar folder present in the local spark download folder
Run the above script by providing the jar path
This will create a /abc/table with column /abc/column and returns result when applying filter criteria
Download spark ** 3.3.0 in local
Repeat #2, #3
Fails with parse exception.

could you please let us know how we can filter on the special characters column or escape them on spark version 3.3.0?

Attachments

Issue Links

duplicates

SPARK-41990 Filtering by composite field name like `field name` doesn't work with pushDownPredicate = true

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Shanmugavel Kuttiyandi Chandrakasu

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 26/Jan/23 02:29

Updated:: 13/Feb/23 07:18

Resolved:: 13/Feb/23 06:58