Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.3.0
-
None
Description
Running the following code results in ever increasing memory usage, even though I would expect the dataframe to be garbage collected when it goes out of scope. For the size of my parquet file, I see the usage increasing about 3GB per loop:
from pyarrow import HdfsClient def read_parquet_file(client, parquet_file): parquet = client.read_parquet(parquet_file) df = parquet.to_pandas() client = HdfsClient("hdfshost", 8020, "myuser", driver='libhdfs3') parquet_file = '/my/parquet/file while True: read_parquet_file(client, parquet_file)
Is there a reference count issue similar to ARROW-362?
Attachments
Issue Links
- blocks
-
ARROW-1014 0.4.0 release
- Resolved