Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.15.1
-
Ubuntu Linux 18.04
Python 3.7.5
Description
Hi! I'm trying to load some nested JSON data and am running into a problem with arrays. I can reproduce it with a slightly modified example from the documentation:
from pyarrow import json import pyarrow as pa with open("test.json", "w") as f: test_json = """{"a": [1], "b": {"c": true, "d": "1991-02-03"}} {"a": [], "b": {"c": false, "d": "2019-04-01"}} """ f.write(test_json) json.read_json("test.json")
Running this code with pyarrow 0.15.1 (I also tried 0.14) gives the following error:
Traceback (most recent call last): File "issue.py", line 11, in <module> ccs = json.read_json("test.json") File "pyarrow/_json.pyx", line 195, in pyarrow._json.read_json File "pyarrow/public-api.pxi", line 285, in pyarrow.lib.pyarrow_wrap_table File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Column 0 named a expected length 2 but got length 1
I've tried various combinations and it seems like the error only appears when the total number of elements in all the "a" arrays is less than the number of rows in the file. I did not expect there to be any relationship between those things and have found nothing in the documentation about it. Is this intentional? If not, I'd suspect there's some problem in the validation step.
Attachments
Issue Links
- links to