Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25467

Python date/datetime objects in dataframes increment by 1 day when converted to JSON

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 2.3.1
    • Fix Version/s: None
    • Component/s: PySpark, SQL
    • Labels:
    • Environment:

      Description

      When Dataframes contains datetime.date or datetime.datetime instances and toJSON() is called on the Dataframe, the day is incremented in the JSON date representation.

      # Create a Dataframe containing datetime.date instances, convert to JSON and display
      rows = [Row(cx=1, cy=2, dates=[datetime.date.fromordinal(1), datetime.date.fromordinal(2)])]
      
      df = sqc.createDataFrame(rows)
      
      df.collect()
      [Row(cx=1, cy=2, dates=[datetime.date(1, 1, 1), datetime.date(1, 1, 2)])]
      
      df.toJSON().collect()
      ['{"cx":1,"cy":2,"dates":["0001-01-03","0001-01-04"]}']
      
      
      # Issue also occurs with datetime.datetime instances
      
      rows = [Row(cx=1, cy=2, dates=[datetime.datetime.fromordinal(1), datetime.datetime.fromordinal(2)])]
      
      df = sqc.createDataFrame(rows)
      
      df.collect()
      [Row(cx=1, cy=2, dates=[datetime.datetime(1, 1, 1, 0, 0, fold=1), datetime.datetime(1, 1, 2, 0, 0)])]
      
      df.toJSON().collect()
      ['{"cx":1,"cy":2,"dates":["0001-01-02T23:50:36.000-06:00","0001-01-03T23:50:36.000-06:00"]}']
      
      

       

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              davidvhill David V. Hill
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: