Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23612

Specify formats for individual DateType and TimestampType columns in schemas

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      https://github.com/apache/spark/blob/407f67249639709c40c46917700ed6dd736daa7d/python/pyspark/sql/types.py#L162-L200

      It would be very helpful if it were possible to specify the format for individual columns in a schema when reading csv files, rather than one format:

      Bar.python
      # Currently can only do something like:
      
      spark.read.option("dateFormat", "yyyyMMdd").csv(...) 
      
      # Would like to be able to do something like:
      
      schema = StructType([
      
          StructField("date1", DateType(format="MM/dd/yyyy"), True),
      
          StructField("date2", DateType(format="yyyyMMdd"), True)
      
      ]
      
      read.schema(schema).csv(...)
      
      

      Thanks for any help, input!

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            rockdoctor Patrick Young
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment