Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25199

InferSchema "all Strings" if one of many CSVs is empty

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Cannot Reproduce
    • 2.2.1
    • None
    • Input/Output
    • I discovered this on AWS Glue, which uses Spark 2.2.1

    Description

      Spark can load multiple CSV files in one read:

      df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("/*.csv")

      However, if one of these files is empty (though it has a header), Spark will set all column types to "String"

      Spark should skip a file for inference if it contains no (non-header) rows

      Attachments

        Activity

          People

            Unassigned Unassigned
            neilmcguigan Neil McGuigan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: