Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2449

[Python] Efficiently serialize functions containing NumPy arrays

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • 0.9.0
    • None
    • Python

    Description

      It is my understanding that pyarrow falls back to serializing functions (and other complex Python objects) using cloudpickle, which means that the contents of those functions are also serialized using the fallback method, rather than the efficient method described in https://ray-project.github.io/2017/10/15/fast-python-serialization-with-ray-and-arrow.html. It would be good to get the benefit of fast zero-copy (de)serialization for objects like NumPy arrays contained inside functions.

      In [1]: import numpy as np, pyarrow as pa
      
      In [2]: pa.__version__
      Out[2]: '0.9.0'
      
      In [3]: arr = np.random.rand(10000)
      
      In [4]: %timeit pa.deserialize(pa.serialize(arr).to_buffer())
      The slowest run took 38.29 times longer than the fastest. This could mean that an intermediate result is being cached.
      10000 loops, best of 3: 68.7 µs per loop
      
      In [5]: def arr_f(): return arr
      
      In [6]: %timeit pa.deserialize(pa.serialize(arr_f).to_buffer())
      The slowest run took 5.89 times longer than the fastest. This could mean that an intermediate result is being cached.
      1000 loops, best of 3: 539 µs per loop
      

      For comparison:

      In [7]: %timeit cloudpickle.loads(cloudpickle.dumps(arr))
      1000 loops, best of 3: 193 µs per loop
      
      In [8]: %timeit cloudpickle.loads(cloudpickle.dumps(arr_f))
      The slowest run took 4.02 times longer than the fastest. This could mean that an intermediate result is being cached.
      1000 loops, best of 3: 429 µs per loop
      

      cc pcmoritz

      Attachments

        Activity

          People

            Unassigned Unassigned
            rshin Richard Shin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: