Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46190

ANSI Double quoted identifiers do not work in Python threads

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.0, 3.4.1, 3.5.0
    • None
    • PySpark

    Description

      Enabling and using `spark.sql.ansi.doubleQuotedIdentifiers` does not work correctly in Python threads

      The following example shows how applying a filter, "\"status\" = 'Unchanged'", leads to empty results when run in a thread. I believe this is because the "status" field is interpreted as a literal in the thread, but as an attribute outside of it.

      from concurrent import futures
      from pyspark import sql
      
      spark = (
        sql.SparkSession.builder.master("local[*]")
        .config("spark.sql.ansi.enabled", "true")
        .config("spark.sql.ansi.doubleQuotedIdentifiers", "true")
        .getOrCreate()
      )
      
      def demonstrate_issue(spark):
        # Path to JSON file with contents:
        # [{"status": "Unchanged"}, {"status": "Changed"}]
        df = spark.read.json("data/example.json")
        df.filter("\"status\" = 'Unchanged'").show()
      
      # Shows 1 record, expected
      demonstrate_issue(spark)
      
      with futures.ThreadPoolExecutor(1) as executor:
        # Shows 0 records, unexpected
        executor.submit(demonstrate_issue, spark)
       

       

      Additional testing notes:

      • When parsing the expression with `sql.functions.expr` in Java via Py4J, the "status" field is interpreted as a literal value from the thread, not an attribute
      • Using double quotes with `spark.sql` does work in the thread
      • Using a dataframe created in memory does work in the thread
      • Tested in versions 3.4.0, 3.4.1, & 3.5.0 on Windows and Mac

       

      The original PR that added this option is here: https://github.com/apache/spark/pull/38022

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            mpayson Max Payson
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: