Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
0.10.0
-
None
-
None
Description
Using the buffers() method on array gives you a list of buffers backing the array, but those buffers lose typing information:
>>> a = pa.array(range(10)) >>> a.type DataType(int64) >>> buffers = a.buffers() >>> [(memoryview(buf).format, memoryview(buf).shape) for buf in buffers] [('b', (2,)), ('b', (80,))]
Conversely, Numpy exposes type information in the Python buffer protocol:
>>> a = pa.array(range(10)) >>> memoryview(a.to_numpy()).format 'l' >>> memoryview(a.to_numpy()).shape (10,)
Exposing type information on buffers could be important for third-party systems, such as Dask/distributed, for type-based data compression when serializing.
Since our C++ buffers are not typed, it's not obvious how to solve this. Should we return tensors instead?