Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
0.7.0
Description
I'm doing some benchmarking of Arrow serialization for dask.distributed to serialize dataframes.
Overall things look good compared to the current implementation (using pickle). The biggest difference was pickle's ability to use pandas' RangeIndex to avoid serializing the entire Index of values when possible.
I suspect that a "range type" isn't in scope for arrow, but in the meantime applications using Arrow could detect the `RangeIndex`, and pass {{ pyarrow.serialize_pandas(df, preserve_index=False) }}
Attachments
Issue Links
- is related to
-
ARROW-1594 [Python] Enable multi-threaded conversions in Table.from_pandas
- Resolved
- relates to
-
ARROW-1639 [Python] More efficient serialization for RangeIndex in serialize_pandas
- Resolved
- links to