Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.17.0, 0.17.1, 1.0.0, 2.0.0
Description
We're using Arrow to convert from JSON to Parquet and occasionally have empty lists in our json. Reading such JSON into an Arrow table and writing it to Parquet currently fails. We noticed this issue in our C++ Arrow code, but it also happens from Python.
Minimal repro:
input.json:
{"foo": []}
convert.py:
import pyarrow.json
import pyarrow.parquet
t = pyarrow.json.read_json("input.json")
pyarrow.parquet.write_table(t, "out.parquet")
Produces:
Traceback (most recent call last):
File "repro.py", line 5, in <module>
pyarrow.parquet.write_table(t, "out.parquet")
env/lib/python3.8/site-packages/pyarrow/parquet.py", line 1717, in write_table
with ParquetWriter(
File "env/lib/python3.8/site-packages/pyarrow/parquet.py", line 554, in _init_
self.writer = _parquet.ParquetWriter(
File "pyarrow/parquet.pyx", line 1409, in pyarrow._parquet.ParquetWriter.cinit_
File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: NullType Arrow field must be nullable
Attachments
Issue Links
- links to