[ARROW-12650] [Doc][Python] Improve documentation regarding dealing with memory mapped files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.0.0
Component/s: Documentation
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/28401

Description

While one of the Arrow promises is that it makes easy to read/write data bigger than memory, it's not immediately obvious from the pyarrow documentation how to deal with memory mapped files.

The doc hints that you can open files as memory mapped ( https://arrow.apache.org/docs/python/memory.html?highlight=memory_map#on-disk-and-memory-mapped-files ) but then it doesn't explain how to read/write Arrow Arrays or Tables from there.

While most high level functions to read/write formats (pqt, feather, ...) have an easy to guess memory_map=True option, the doc doesn't seem to have any example of how that is meant to work for Arrow format itself. For example how you can do that using RecordBatchFile*.

An addition to the memory mapping section that makes a more meaningful example that reads/writes actual arrow data (instead of plain bytes) would probably be helpful

Attachments

Issue Links

links to

GitHub Pull Request #10266

Activity

People

Assignee:: Alessandro Molina

Reporter:: Alessandro Molina

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 04/May/21 15:08

Updated:: 11/Jan/23 08:27

Resolved:: 26/Jul/21 15:44

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

6h 50m