[ARROW-18164] [Python] Dataset scanner does not follow default memory pool setting - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 11.0.0
Component/s: Python
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/20474

Description

Even if I set the system memory pool as default, it still uses the jemalloc one (running this on Ubuntu where jemalloc is the default if not set by the user):

import pyarrow as pa
import pyarrow.dataset as ds
import pyarrow.parquet as pq
pq.write_table(pa.table({'a': [1, 2, 3]}), "test.parquet")

In [2]: pa.set_memory_pool(pa.system_memory_pool())

In [3]: pa.total_allocated_bytes()
Out[3]: 0

In [4]: table = ds.dataset("test.parquet").to_table()

In [5]: pa.total_allocated_bytes()
Out[5]: 0

In [6]: pa.set_memory_pool(pa.jemalloc_memory_pool())

In [7]: pa.total_allocated_bytes()
Out[7]: 128

Attachments

Issue Links

links to

GitHub Pull Request #14516

Activity

People

Assignee:: Joris Van den Bossche

Reporter:: Joris Van den Bossche

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 26/Oct/22 08:38

Updated:: 11/Jan/23 11:58

Resolved:: 09/Nov/22 16:33

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 10m