[SPARK-22324] Upgrade Arrow to version 0.8.0 and upgrade Netty to 4.1.17 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.3.0
Component/s: PySpark, SQL
Labels:
None

Description

Arrow version 0.8.0 is slated for release in early November, but I'd like to start discussing to help get all the work that's being done synced up.

Along with upgrading the Arrow Java artifacts, pyarrow on our Jenkins test envs will need to be upgraded as well that will take a fair amount of work and planning.

One topic I'd like to discuss is if pyarrow should be an installation requirement for pyspark, i.e. when a user pip installs pyspark, it will also install pyarrow. If not, then is there a minimum version that needs to be supported? We currently have 0.4.1 installed on Jenkins.

There are a number of improvements and cleanups in the current code that can happen depending on what we decide (I'll link them all here later, but off the top of my head):

Decimal bug fix and improved support
Improved internal casting between pyarrow and pandas (can clean up some workarounds), this will also verify data bounds if the user specifies a type and data overflows. see https://github.com/apache/spark/pull/19459#discussion_r146421804
Better type checking when converting Spark types to Arrow
Timestamp conversion to microseconds (for Spark internal format)
Full support for using validity mask with 'object' types https://github.com/apache/spark/pull/18664#discussion_r146567335
VectorSchemaRoot can call close more than once to simplify listener https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala#L90

Attachments

Issue Links

is duplicated by

SPARK-22656 Upgrade Arrow to 0.8.0

Resolved

supercedes

SPARK-21750 Use arrow 0.6.0

Closed

links to

[Github] Pull Request #19884 (BryanCutler)

[Github] Pull Request #20089 (ueshin)

Activity

People

Assignee:: Bryan Cutler

Reporter:: Bryan Cutler

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 20/Oct/17 18:29

Updated:: 12/Dec/22 18:11

Resolved:: 21/Dec/17 11:45