Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.17.1, 1.0.0
Description
Test on: pyarrow 1.0.1, system: Ubuntu 16.04, python3.7
Reproduce code:
Generate about 800MB data first.
import pyarrow as pa # generate about 800MB data data = [pa.array([10]* 1000)] batch = pa.record_batch(data, names=['f0']) with open('/tmp/t1.pa', 'wb') as f1: writer = pa.ipc.new_stream(f1, batch.schema) for i in range(100000): writer.write_batch(batch) writer.close()
Test to_pandas with self_destruct=True, split_blocks=True, use_threads=False
import pyarrow as pa import time import sys import os pid = os.getpid() print(f'run `psrecord {pid} --plot /tmp/t001.png` and then press ENTER.') sys.stdin.readline() with open('/tmp/t1.pa', 'rb') as f1: reader = pa.ipc.open_stream(f1) batches = [b for b in reader] pa_table = pa.Table.from_batches(batches) del batches time.sleep(3) pdf = pa_table.to_pandas(self_destruct=True, split_blocks=True, use_threads=False) del pa_table time.sleep(3)
The attached file is psrecord profiling result.
Attachments
Attachments
Issue Links
- links to