Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30961

Arrow enabled: to_pandas with date column fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 2.4.5
    • None
    • PySpark
    • Apache Spark 2.4.5

    Description

      Hi,

      there seems to be a bug in the arrow enabled to_pandas conversion from spark dataframe to pandas dataframe when the dataframe has a column of type DateType. Here is a minimal example to reproduce the issue:

      spark = SparkSession.builder.getOrCreate()
      is_arrow_enabled = spark.conf.get("spark.sql.execution.arrow.enabled")
      print("Arrow optimization is enabled: " + is_arrow_enabled)
      spark_df = spark.createDataFrame(
          [['2019-12-06']], 'created_at: string') \
          .withColumn('created_at', F.to_date('created_at'))
      
      # works
      spark_df.toPandas()
      
      spark.conf.set("spark.sql.execution.arrow.enabled", 'true')
      is_arrow_enabled = spark.conf.get("spark.sql.execution.arrow.enabled")
      print("Arrow optimization is enabled: " + is_arrow_enabled)
      # raises AttributeError: Can only use .dt accessor with datetimelike values
      # series is still of type object, .dt does not exist
      spark_df.toPandas()

      A fix would be to modify the _check_series_convert_date function in pyspark.sql.types to:

      def _check_series_convert_date(series, data_type):
          """
          Cast the series to datetime.date if it's a date type, otherwise returns the original series.    :param series: pandas.Series
          :param data_type: a Spark data type for the series
          """
          from pyspark.sql.utils import require_minimum_pandas_version
          require_minimum_pandas_version()    from pandas import to_datetime
          if type(data_type) == DateType:
              return to_datetime(series).dt.date
          else:
              return series
      

      Let me know if I should prepare a Pull Request for the 2.4.5 branch.

      I have not tested the behavior on master branch.

       

      Thanks,

      Nicolas

      Attachments

        Activity

          People

            Unassigned Unassigned
            nicornk Nicolas Renkamp
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: