[SPARK-30961] Arrow enabled: to_pandas with date column fails - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: 2.4.5
Fix Version/s: None
Component/s: PySpark
Labels:
- ready-to-commit
Environment:

Apache Spark 2.4.5

External issue URL:
https://stackoverflow.com/questions/54905790/error-when-converting-from-spark-dataframe-with-dates-to-pandas-dataframe

Description

Hi,

there seems to be a bug in the arrow enabled to_pandas conversion from spark dataframe to pandas dataframe when the dataframe has a column of type DateType. Here is a minimal example to reproduce the issue:

spark = SparkSession.builder.getOrCreate()
is_arrow_enabled = spark.conf.get("spark.sql.execution.arrow.enabled")
print("Arrow optimization is enabled: " + is_arrow_enabled)
spark_df = spark.createDataFrame(
    [['2019-12-06']], 'created_at: string') \
    .withColumn('created_at', F.to_date('created_at'))

# works
spark_df.toPandas()

spark.conf.set("spark.sql.execution.arrow.enabled", 'true')
is_arrow_enabled = spark.conf.get("spark.sql.execution.arrow.enabled")
print("Arrow optimization is enabled: " + is_arrow_enabled)
# raises AttributeError: Can only use .dt accessor with datetimelike values
# series is still of type object, .dt does not exist
spark_df.toPandas()

A fix would be to modify the _check_series_convert_date function in pyspark.sql.types to:

def _check_series_convert_date(series, data_type):
    """
    Cast the series to datetime.date if it's a date type, otherwise returns the original series.    :param series: pandas.Series
    :param data_type: a Spark data type for the series
    """
    from pyspark.sql.utils import require_minimum_pandas_version
    require_minimum_pandas_version()    from pandas import to_datetime
    if type(data_type) == DateType:
        return to_datetime(series).dt.date
    else:
        return series

Let me know if I should prepare a Pull Request for the 2.4.5 branch.

I have not tested the behavior on master branch.

Thanks,

Nicolas

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Nicolas Renkamp

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 26/Feb/20 16:19

Updated:: 06/Mar/20 19:24

Resolved:: 06/Mar/20 19:22