[SPARK-22417] createDataFrame from a pandas.DataFrame reads datetime64 values as longs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.2.1, 2.3.0
Component/s: PySpark
Labels:
None

Description

When trying to create a Spark DataFrame from an existing Pandas DataFrame using createDataFrame, columns with datetime64 values are converted as long values. This is only when the schema is not specified.

In [2]: import pandas as pd
   ...: from datetime import datetime
   ...: 

In [3]: pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)]})

In [4]: df = spark.createDataFrame(pdf)

In [5]: df.show()
+-------------------+
|                 ts|
+-------------------+
|1509411661000000000|
+-------------------+


In [6]: df.schema
Out[6]: StructType(List(StructField(ts,LongType,true)))

Spark should interpret a datetime64[D] value to DateType and other datetime64 values to TImestampType.

Attachments

Issue Links

is related to

SPARK-20791 Use Apache Arrow to Improve Spark createDataFrame from Pandas.DataFrame

Resolved

links to

[Github] Pull Request #19646 (BryanCutler)

[Github] Pull Request #19704 (ueshin)

Activity

People

Assignee:: Bryan Cutler

Reporter:: Bryan Cutler

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 01/Nov/17 22:41

Updated:: 09/Nov/17 04:53

Resolved:: 07/Nov/17 20:32