Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41281 Feature parity: SparkSession API in Spark Connect
  3. SPARK-42679

createDataFrame doesn't work with non-nullable schema.

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.1
    • Connect
    • None

    Description

      spark.createDataFrame won't work with non-nullable schema as below:

      from pyspark.sql.types import *
      schema_false = StructType([StructField("id", IntegerType(), False)])
      spark.createDataFrame([[1]], schema=schema_false)
      
      Traceback (most recent call last):
      ...
      pyspark.errors.exceptions.connect.AnalysisException: [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's required to be non-nullable.

      whereas it works fine with nullable schema:

      schema_true = StructType([StructField("id", IntegerType(), True)])
      spark.createDataFrame([[1]], schema=schema_true)
      
      DataFrame[id: int]

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              itholic Haejoon Lee
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: