Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30640

Prevent unnessary copies of data in Arrow to Pandas conversion with Timestamps

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.4
    • Fix Version/s: 3.0.0
    • Component/s: PySpark, SQL
    • Labels:
      None

      Description

      During conversion of Arrow to Pandas, timestamp columns are modified to localize for the current timezone. If there are no timestamp columns, this can sometimes result in unnecessary copies of the data. See https://www.mail-archive.com/dev@arrow.apache.org/msg17008.html for discussion.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bryanc Bryan Cutler
                Reporter:
                bryanc Bryan Cutler
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: