Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1593

[PYTHON] serialize_pandas should pass through the preserve_index keyword

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.7.0
    • 0.8.0
    • Python

    Description

      I'm doing some benchmarking of Arrow serialization for dask.distributed to serialize dataframes.

      Overall things look good compared to the current implementation (using pickle). The biggest difference was pickle's ability to use pandas' RangeIndex to avoid serializing the entire Index of values when possible.

      I suspect that a "range type" isn't in scope for arrow, but in the meantime applications using Arrow could detect the `RangeIndex`, and pass {{ pyarrow.serialize_pandas(df, preserve_index=False) }}

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              TomAugspurger Tom Augspurger
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: