Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.0
    • Component/s: PySpark, SQL
    • Labels:
      None

      Description

      Date and timestamp are not yet supported in DataFrame.toPandas() using ArrowConverters. These are common types for data analysis used in both Spark and Pandas and should be supported.

      There is a discrepancy with the way that PySpark and Arrow store timestamps, without timezone specified, internally. PySpark takes a UTC timestamp that is adjusted to local time and Arrow is in UTC time. Hopefully there is a clean way to resolve this.

      Spark internal storage spec:

      • DateType stored as days
      • Timestamp stored as microseconds

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bryanc Bryan Cutler
                Reporter:
                bryanc Bryan Cutler
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: