Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7647

[C++] JSON reader fails to read arrays with few values

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.15.1
    • 0.16.0
    • C++, Python
    • Ubuntu Linux 18.04
      Python 3.7.5

    Description

      Hi! I'm trying to load some nested JSON data and am running into a problem with arrays. I can reproduce it with a slightly modified example from the documentation:

      from pyarrow import json
      import pyarrow as pa
      
      with open("test.json", "w") as f:
          test_json = """{"a": [1], "b": {"c": true, "d": "1991-02-03"}}
      {"a": [], "b": {"c": false, "d": "2019-04-01"}}
      """
          f.write(test_json)
      
      json.read_json("test.json")
      

      Running this code with pyarrow 0.15.1 (I also tried 0.14) gives the following error:

      Traceback (most recent call last):
        File "issue.py", line 11, in <module>
          ccs = json.read_json("test.json")
        File "pyarrow/_json.pyx", line 195, in pyarrow._json.read_json
        File "pyarrow/public-api.pxi", line 285, in pyarrow.lib.pyarrow_wrap_table
        File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
      pyarrow.lib.ArrowInvalid: Column 0 named a expected length 2 but got length 1
      

      I've tried various combinations and it seems like the error only appears when the total number of elements in all the "a" arrays is less than the number of rows in the file. I did not expect there to be any relationship between those things and have found nothing in the documentation about it. Is this intentional? If not, I'd suspect there's some problem in the validation step.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bkietz Ben Kietzman Assign to me
            jofo Johan Forsberg
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 1.5h
              1.5h

              Slack

                Issue deployment