[ARROW-2913] [Python] Exported buffers don't expose type information - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: 0.10.0
Fix Version/s: None
Component/s: C++, Python
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/19281

Description

Using the buffers() method on array gives you a list of buffers backing the array, but those buffers lose typing information:

>>> a = pa.array(range(10))
>>> a.type
DataType(int64)
>>> buffers = a.buffers()
>>> [(memoryview(buf).format, memoryview(buf).shape) for buf in buffers]
[('b', (2,)), ('b', (80,))]

Conversely, Numpy exposes type information in the Python buffer protocol:

>>> a = pa.array(range(10))
>>> memoryview(a.to_numpy()).format
'l'
>>> memoryview(a.to_numpy()).shape
(10,)

Exposing type information on buffers could be important for third-party systems, such as Dask/distributed, for type-based data compression when serializing.

Since our C++ buffers are not typed, it's not obvious how to solve this. Should we return tensors instead?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Antoine Pitrou

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 25/Jul/18 19:01

Updated:: 11/Jan/23 07:24

Resolved:: 06/Feb/19 04:58