Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42193

dataframe API filter criteria throwing ParseException when reading a JDBC column name with special characters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Cannot Reproduce
    • 3.3.0
    • None
    • SQL
    • None

    Description

      On Spark 3.3.0, when reading from a JDBC table(used SQLite to repro) using spark.read.jdbc command with sqlite-jdbc:3.34.0.jar on a table and column name containing special characters. Dataframe API filter criteria fails with parse Exception 

      [#Script:]

      from pyspark.sql import SparkSession
      
      spark = SparkSession \
          .builder \
          .appName("Databricks Support") \
          .config("spark.jars.packages", "org.xerial:sqlite-jdbc:3.34.0") \
          .getOrCreate()
      
      columns = ["id", "/abc/column", "value"]
      data = [(1, 'A', 100), (2, 'B', 200), (3, 'B', 300)]
      
      rdd = spark.sparkContext.parallelize(data)
      df = spark.createDataFrame(rdd).toDF(*columns)
      
      options = {"url": "jdbc:sqlite:/<local-path>/spark-3.3.1-bin-hadoop3/jars/test.db", "dbtable": '"/abc/table"', "driver": "org.sqlite.JDBC"}
      
      df.coalesce(1).write.format("jdbc").options(**options).mode("append").save()
      
      df_1 = spark.read.format("jdbc") \
          .option("url", "jdbc:sqlite:/<local-path>/spark-3.3.1-bin-hadoop3/jars/test.db") \
          .option("dbtable", '"/abc/table"') \
          .option("driver", "org.sqlite.JDBC") \
          .load()
      
      df_2 = df_1.filter("`/abc/column` = 'B'")
      
      df_2.show() 

      Error:

      ``` Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
       File "/opt/homebrew/Cellar/apache-spark/3.3.1/libexec/python/pyspark/sql/dataframe.py", line 606, in show
        print(self._jdf.showString(n, 20, vertical))
       File "/opt/homebrew/Cellar/apache-spark/3.3.1/libexec/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
       File "/opt/homebrew/Cellar/apache-spark/3.3.1/libexec/python/pyspark/sql/utils.py", line 196, in deco
        raise converted from None
      pyspark.sql.utils.ParseException: 
      Syntax error at or near '/': extra input '/'(line 1, pos 0)
      
      == SQL ==
      /abc/column
      ^^^```  

      However, when using Spark 3.2.1, we are able to successfully apply dataframe.filter option

      >>> df_2.show()
      +---+-----------+-----+
      | id|/abc/column|value|
      +---+-----------+-----+
      |  2|          B|  200|
      |  3|          B|  300|
      +---+-----------+-----+ 

      Repro steps:

      1. Download Spark 3.2.1 in local
      2. Download and Copy the sqlite-jdbc:3.34.0.jar into the jar folder present in the local spark download folder
      3. Run the above script by providing the jar path 
      4. This will create a /abc/table with column /abc/column  and returns result when applying filter criteria
      5. Download spark ** 3.3.0 in local
      6. Repeat #2, #3 
      7. Fails with parse exception. 

      could you please let us know how we can filter on the special characters column or escape them on spark version 3.3.0?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kc.shanmugavel Shanmugavel Kuttiyandi Chandrakasu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: