[ARROW-6874] [Python] Memory leak in Table.to_pandas() when conversion to object dtype - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.15.0
Fix Version/s: 0.15.1, 0.16.0
Component/s: Python
Labels:
- pull-request-available
Environment:
Operating system: Windows 10
pyarrow installed via conda
both python environments were identical except pyarrow:
python: 3.6.7
numpy: 1.17.2
pandas: 0.25.1

External issue URL:
https://github.com/apache/arrow/issues/23202

Description

I upgraded from pyarrow 0.14.1 to 0.15.0 and during some testing my python interpreter ran out of memory.

I narrowed the issue down to the pyarrow.Table.to_pandas() call, which appears to have a memory leak in the latest version. See details below to reproduce this issue.

import numpy as np
import pandas as pd
import pyarrow as pa

# create a table with one nested array column
nested_array = pa.array([np.random.rand(1000) for i in range(500)])
nested_array.type  # ListType(list<item: double>)
table = pa.Table.from_arrays(arrays=[nested_array], names=['my_arrays'])

# convert it to a pandas DataFrame in a loop to monitor memory consumption
num_iterations = 10000
# pyarrow v0.14.1: Memory allocation does not grow during loop execution
# pyarrow v0.15.0: ~550 Mb is added to RAM, never garbage collected
for i in range(num_iterations):
    df = pa.Table.to_pandas(table)


# When the table column is not nested, no memory leak is observed
array = pa.array(np.random.rand(500 * 1000))
table = pa.Table.from_arrays(arrays=[array], names=['numbers'])
# no memory leak:
for i in range(num_iterations):
    df = pa.Table.to_pandas(table)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Screenshot_2020-08-05_10-11-45.png
05/Aug/20 02:21
157 kB
jesse ventura

Issue Links

duplicates

ARROW-6976 Possible memory leak in pyarrow read_parquet

Closed

links to

GitHub Pull Request #5674

Activity

People

Assignee:: Antoine Pitrou

Reporter:: Sergey Mozharov

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 14/Oct/19 15:10

Updated:: 11/Jan/23 07:50

Resolved:: 16/Oct/19 18:42

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1.5h