Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25545

CSV loading with DROPMALFORMED mode doesn't correctly drop rows that do not confirm to non-nullable schema fields

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • 2.3.0, 2.3.1, 2.3.2
    • None
    • SQL

    Description

      I'm loading a CSV file into a dataframe using Spark. I have defined a Schema and specified one of the fields as non-nullable.

      When setting the mode to DROPMALFORMED, I expect any rows in the CSV with missing (null) values for those columns to result in the whole row being dropped. At the moment, the CSV loader correctly drops rows that do not conform to the field type, but the nullable property is seemingly ignored.

      Example CSV input:

      1,2,3
      1,,3
      ,2,3
      1,2,abc
      

      Example Spark job:

      val spark = SparkSession
        .builder()
        .appName("csv-test")
        .master("local")
        .getOrCreate()
      
      spark.read
        .format("csv")
        .schema(StructType(
          StructField("col1", IntegerType, nullable = false) ::
            StructField("col2", IntegerType, nullable = false) ::
            StructField("col3", IntegerType, nullable = false) :: Nil))
        .option("header", false)
        .option("mode", "DROPMALFORMED")
        .load("path/to/file.csv")
        .coalesce(1)
        .write
        .format("csv")
        .option("header", false)
        .save("path/to/output")
      

      The actual output will be:

      1,2,3
      1,,3
      ,2,3

      Note that the row containing non-integer values has been dropped, as expected, but rows containing null values persist, despite the nullable property being set to false in the schema definition.

      My expected output is:

      1,2,3

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stevebakh Steven Bakhtiari
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: