Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23436

Incorrect Date column Inference in partition discovery

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.1
    • Fix Version/s: 2.3.1, 2.4.0
    • Component/s: SQL
    • Labels:
      None

      Description

      If a Partition column appears to partial date/timestamp

          example : 2018-01-01-23 

      where it is only truncated upto an hour then the data types of the partitioning columns are automatically inferred as date however, the values are loaded as null. 

      Here is an example code to reproduce this behaviour

       

       

      val data = Seq(("1", "2018-01", "2018-01-01-04", "test")).toDF("id", "date_month", "data_hour", "data")  
      
      data.write.partitionBy("id","date_month","data_hour").parquet("output/test")
      
      val input = spark.read.parquet("output/test")  
      
      input.printSchema()
      
      input.show()
      
      
      ## Result ###
      
      root
      
      |-- data: string (nullable = true)
      
      |-- id: integer (nullable = true)
      
      |-- date_month: string (nullable = true)
      
      |-- data_hour: date (nullable = true)
      
      
      
      +----+---+----------+---------+
      
      |data| id|date_month|data_hour|
      
      +----+---+----------+---------+
      
      |test|  1|   2018-01|     null|
      
      +----+---+----------+---------+

       

        Attachments

          Activity

            People

            • Assignee:
              mgaido Marco Gaido
              Reporter:
              apoorva.sareen@gmail.com Apoorva Sareen
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: