Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-14439

[Python][C++] Segfault with read_json when a field is missing

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 5.0.0
    • 6.0.0
    • C++, Python
    • None

    Description

      When reading a JSON Lines file, a segfault can happen if there's a missing field at one point.
      In particular when the missing field is supposed to be a list, and if the block size is small enough.

      Here is an example to reproduce:

      import io
      
      import pyarrow.json as paj
      
      batch = b'{"a": [], "b": 1}\n{"b": 1}'
      block_size = 12
      
      paj.read_json(
          io.BytesIO(batch), read_options=paj.ReadOptions(block_size=block_size)
      )
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              lhoestq quentin lhoest
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: