Description
When trying to create a Spark DataFrame from an existing Pandas DataFrame using createDataFrame, columns with datetime64 values are converted as long values. This is only when the schema is not specified.
In [2]: import pandas as pd ...: from datetime import datetime ...: In [3]: pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)]}) In [4]: df = spark.createDataFrame(pdf) In [5]: df.show() +-------------------+ | ts| +-------------------+ |1509411661000000000| +-------------------+ In [6]: df.schema Out[6]: StructType(List(StructField(ts,LongType,true)))
Spark should interpret a datetime64[D] value to DateType and other datetime64 values to TImestampType.
Attachments
Issue Links
- is related to
-
SPARK-20791 Use Apache Arrow to Improve Spark createDataFrame from Pandas.DataFrame
- Resolved
- links to