Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20457

Spark CSV is not able to Override Schema while reading data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.1.0
    • None
    • SQL
    • None

    Description

      I have a CSV file, test.csv:

      col
      1
      2
      3
      4
      

      When I read it using Spark, it gets the schema of data correct:

      val df = spark.read.option("header", "true").option("inferSchema", "true").csv("test.csv")
          
      df.printSchema
      root
      |-- col: integer (nullable = true)
      

      But when I override the `schema` of CSV file and make `inferSchema` false, then SparkSession is picking up custom schema partially.

      val df = spark.read.option("header", "true").option("inferSchema", "false").schema(StructType(List(StructField("custom", StringType, false)))).csv("test.csv")
      
      df.printSchema
      root
      |-- custom: string (nullable = true)
      

      I mean only column name (`custom`) and DataType (`StringType`) are getting picked up. But, `nullable` part is being ignored, as it is still coming `nullable = true`, which is incorrect.

      I am not able to understand this behavior.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              himstein Himanshu Gupta
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: