Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39731

Correctness issue when parsing dates with yyyyMMdd format in CSV and JSON

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • SQL
    • None

    Description

      In Spark 3.x, when reading CSV data like this:

      name,mydate
      1,2020011
      2,20201203

      and specifying date pattern as "yyyyMMdd", dates are not parsed correctly with CORRECTED time parser policy.

      For example,

      val df = spark.read.schema("name string, mydate date").option("dateFormat", "yyyyMMdd").option("header", "true").csv("file:/tmp/test.csv")
      
      df.show(false)

      Returns:

      +----+--------------+
      |name|mydate        |
      +----+--------------+
      |1   |+2020011-01-01|
      |2   |2020-12-03    |
      +----+--------------+ 

      and it used to return null instead of the invalid date in Spark 3.2 or below.

       

      The issue appears to be caused by this PR: https://github.com/apache/spark/pull/32959.

       

      A similar issue can observed in JSON data source.

      test.json

      {"date": "2020011"}
      {"date": "20201203"} 

       

      Running commands

      val df = spark.read.schema("date date").option("dateFormat", "yyyyMMdd").json("file:/tmp/test.json")
      df.show(false) 

      returns

      +--------------+
      |date          |
      +--------------+
      |+2020011-01-01|
      |2020-12-03    |
      +--------------+

      but before the patch linked in the description it used to show:

      +----------+
      |date      |
      +----------+
      |7500-08-09|
      |2020-12-03|
      +----------+

      which is strange either way. I will try to address it in the PR.

      Attachments

        Issue Links

          Activity

            People

              ivan.sadikov Ivan Sadikov
              ivan.sadikov Ivan Sadikov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: