The bug can be triggered through python:
import pyarrow.parquet array = pyarrow.array.from_pylist([1] * 1000000) rb = pyarrow.RecordBatch.from_arrays([array], ['a']) rb2 = rb.slice(0,2) with open('/tmp/t.arrow', 'wb') as f: w = pyarrow.ipc.FileWriter(f, rb.schema) w.write_batch(rb2) w.close()
which will result in a big file:
$ ll /tmp/t.arrow -rw-rw-r-- 1 itai itai 800618 Apr 12 13:22 /tmp/t.arrow
- blocks
-
ARROW-670 Arrow 0.3 release
-
- Resolved
-