Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
For example a following closure test will fail:
import pyarrow as pa import pyarrow.parquet as pq data = [pa.array([None] * 10)] batch = pa.RecordBatch.from_arrays(data, ['x']) table = pa.Table.from_batches([batch]) pq.write_table(table, "test.parquet", compression='LZ4') table = pq.read_table("test.parquet")
with a following error
Traceback (most recent call last): File "test.py", line 8, in <module> table = pq.read_table("test.parquet") File "python3.6/site-packages/pyarrow/parquet.py", line 987, in read_table use_pandas_metadata=use_pandas_metadata) File "python3.6/site-packages/pyarrow/parquet.py", line 149, in read nthreads=nthreads) File "_parquet.pyx", line 736, in pyarrow._parquet.ParquetReader.read_all File "error.pxi", line 83, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Arrow error: IOError: Corrupt Lz4 compressed data.
Writing file from with LZ4 from python requires patch for ARROW-2570. But the issue can be reproduced by creating an input file with parquet-cpp. The file must be compressed with LZ4 and contain a column with only gap values.
Attachments
Issue Links
- links to