There needs to be user facing documentation that will show how to enable/use Arrow with Spark, what the user should expect, and describe any differences with similar existing functionality.
A comment from Xiao Li on https://github.com/apache/spark/pull/18664
Given the users/applications contain the Timestamp in their Dataset and their processing algorithms also need to have the codes based on the corresponding time-zone related assumptions.
- For the new users/applications, they first enabled Arrow and later hit an Arrow bug? Can they simply turn off spark.sql.execution.arrow.enable? If not, what should they do?
- For the existing users/applications, they want to utilize Arrow for better performance. Can they just turn on spark.sql.execution.arrow.enable? What should they do?
Note Hopefully, the guides/solutions are user-friendly. That means, it must be very simple to understand for most users.