XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.0
    • Component/s: PySpark, SQL
    • Labels:
      None
    • Target Version/s:

      Description

      There needs to be user facing documentation that will show how to enable/use Arrow with Spark, what the user should expect, and describe any differences with similar existing functionality.

      A comment from Xiao Li on https://github.com/apache/spark/pull/18664

      Given the users/applications contain the Timestamp in their Dataset and their processing algorithms also need to have the codes based on the corresponding time-zone related assumptions.

      • For the new users/applications, they first enabled Arrow and later hit an Arrow bug? Can they simply turn off spark.sql.execution.arrow.enable? If not, what should they do?
      • For the existing users/applications, they want to utilize Arrow for better performance. Can they just turn on spark.sql.execution.arrow.enable? What should they do?

      Note Hopefully, the guides/solutions are user-friendly. That means, it must be very simple to understand for most users.

        Attachments

          Activity

            People

            • Assignee:
              bryanc Bryan Cutler
              Reporter:
              bryanc Bryan Cutler
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: