Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47211

Fix ignored PySpark Connect string collation

    XMLWordPrintableJSON

Details

    Description

      When using Connect with PySpark, string collation silently gets dropped:

      Client connected to the Spark Connect server at localhost
      SparkSession available as 'spark'.
      >>> spark.sql("select 'abc' collate 'UNICODE'")
      DataFrame[collate(abc): string]
      >>> from pyspark.sql.types import StructType, StringType, StructField
      >>> spark.createDataFrame([], StructType([StructField('id', StringType(2))]))
      DataFrame[id: string]
      

      Instead of "string" type in dataframe, we should be seeing "string COLLATE 'UNICODE'".

      Attachments

        Issue Links

          Activity

            People

              nikolamand-db Nikola Mandic
              nikolamand-db Nikola Mandic
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: