Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19228

inferSchema function processed csv date column as string and "dateFormat" DataSource option is ignored

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.1.0
    • None
    • Input/Output, SQL

    Description

      Current FastDateFormat parser can't properly parse date and timestamp and does not meet the ISO8601.
      For example, I need to process user.csv like this:

      id,project,started,ended
      sergey.rubtsov,project0,12/12/2012,10/10/2015
      

      When I add date format options:

              Dataset<Row> users = spark.read().format("csv").option("mode", "PERMISSIVE").option("header", "true")
                                      .option("inferSchema", "true").option("dateFormat", "dd/MM/yyyy").load("src/main/resources/user.csv");
      		users.printSchema();
      

      expected scheme should be

      root
       |-- id: string (nullable = true)
       |-- project: string (nullable = true)
       |-- started: date (nullable = true)
       |-- ended: date (nullable = true)
      

      but the actual result is:

      root
       |-- id: string (nullable = true)
       |-- project: string (nullable = true)
       |-- started: string (nullable = true)
       |-- ended: string (nullable = true)
      

      This mean that date processed as string and "dateFormat" option is ignored.
      If I add option

      .option("timestampFormat", "dd/MM/yyyy")
      

      result is:

      root
       |-- id: string (nullable = true)
       |-- project: string (nullable = true)
       |-- started: timestamp (nullable = true)
       |-- ended: timestamp (nullable = true)
      

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Sergey Rubtsov Sergey Rubtsov
              Votes:
              3 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 6h
                  6h
                  Remaining:
                  Remaining Estimate - 6h
                  6h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified