Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41283 Feature parity: Functions API in Spark Connect
  3. SPARK-41455

Resolve dtypes inconsistencies of date/timestamp functions

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • PySpark
    • None

    Description

      When implementing date/timestamp functions, we notice inconsistent dtypes with PySpark, as shown below.

      >> sdf.select(SF.current_timestamp()).toPandas().dtypes
      current_timestamp()    datetime64[ns]
      dtype: object
      >>> cdf.select(CF.current_timestamp()).toPandas().dtypes
      current_timestamp()    datetime64[ns, America/Los_Angeles]
      

      Affected functions include:

      to_timestamp, from_utc_timestamp, to_utc_timestamp, timestamp_seconds, current_timestamp, date_trunc
      

      We may have to implement `is_timestamp_ntz_preferred` for Connect.

      After the fix, tests of those date/timestamp functions which use `compare_by_show` should be switched to `toPandas` comparison.

      Attachments

        Activity

          People

            podongfeng Ruifeng Zheng
            XinrongM Xinrong Meng
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: