[ARROW-10709] [Python] Difficult to make an efficient zero-copy file reader in Python - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0
Component/s: Python
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/26658

Description

There is an option to do efficient data transport using file.read_buffer() using zero memory copies (benchmarking have confirmed that, very nice!).

However, file.read_buffer() when backed by a Python object (via PythonFile), will call PythonFile.read() via PyReadableFile::Read. A 'normal' file.read() that does memory copying, also calls the PythonFile.read() method, but only allows for a bytes object (PyBytes_Check is used in PyReadableFile::Read).
This makes it hard to create 1 file object in Python land that supports normal .read() (and thus needs to returns a bytes object) and to also support a zero-copy route where .read() can return a memory view.
Possibly the strict check on PyBytes_Check can me lifted by also allowing trying PyObject_GetBuffer.

Attachments

Issue Links

links to

GitHub Pull Request #8755

Activity

People

Assignee:: Maarten Breddels

Reporter:: Maarten Breddels

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 24/Nov/20 09:18

Updated:: 11/Jan/23 08:15

Resolved:: 01/Dec/20 10:39

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

4h 10m