[ARROW-1593] [PYTHON] serialize_pandas should pass through the preserve_index keyword - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.7.0
Fix Version/s: 0.8.0
Component/s: Python
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/17606

Description

I'm doing some benchmarking of Arrow serialization for dask.distributed to serialize dataframes.

Overall things look good compared to the current implementation (using pickle). The biggest difference was pickle's ability to use pandas' RangeIndex to avoid serializing the entire Index of values when possible.

I suspect that a "range type" isn't in scope for arrow, but in the meantime applications using Arrow could detect the `RangeIndex`, and pass {{ pyarrow.serialize_pandas(df, preserve_index=False) }}

Attachments

Issue Links

is related to

ARROW-1594 [Python] Enable multi-threaded conversions in Table.from_pandas

Resolved

relates to

ARROW-1639 [Python] More efficient serialization for RangeIndex in serialize_pandas

Resolved

links to

GitHub Pull Request #1190

Activity

People

Assignee:: Wes McKinney

Reporter:: Tom Augspurger

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 21/Sep/17 16:37

Updated:: 11/Jan/23 07:15

Resolved:: 10/Oct/17 01:03