Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23436

Incorrect Date column Inference in partition discovery

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.1
    • 2.3.1, 2.4.0
    • SQL
    • None

    Description

      If a Partition column appears to partial date/timestamp

          example : 2018-01-01-23 

      where it is only truncated upto an hour then the data types of the partitioning columns are automatically inferred as date however, the values are loaded as null. 

      Here is an example code to reproduce this behaviour

       

       

      val data = Seq(("1", "2018-01", "2018-01-01-04", "test")).toDF("id", "date_month", "data_hour", "data")  
      
      data.write.partitionBy("id","date_month","data_hour").parquet("output/test")
      
      val input = spark.read.parquet("output/test")  
      
      input.printSchema()
      
      input.show()
      
      
      ## Result ###
      
      root
      
      |-- data: string (nullable = true)
      
      |-- id: integer (nullable = true)
      
      |-- date_month: string (nullable = true)
      
      |-- data_hour: date (nullable = true)
      
      
      
      +----+---+----------+---------+
      
      |data| id|date_month|data_hour|
      
      +----+---+----------+---------+
      
      |test|  1|   2018-01|     null|
      
      +----+---+----------+---------+

       

      Attachments

        Activity

          People

            mgaido Marco Gaido
            apoorva.sareen@gmail.com Apoorva Sareen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: