Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18229

[C++][Python] RecordBatchReader can be created with a 'dict' schema which then crashes on use

    XMLWordPrintableJSON

Details

    Description

      Presumably we should disallow this or convert it to a schema?

      https://github.com/duckdb/duckdb/issues/5143

      >>> import pyarrow as pa
      >>> pa.__version__
      '10.0.0'
      >>> reader = pa.RecordBatchReader.from_batches({"a": pa.int8()}, [])
      >>> reader.schema
      fish: Job 1, 'python3' terminated by signal SIGSEGV (Address boundary error)
      
      (gdb) bt
      #0  0x00007ffff4247580 in arrow::Schema::num_fields() const ()
         from /home/lidavidm/miniconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1000
      #1  0x00007ffff42b93f7 in arrow::(anonymous namespace)::SchemaPrinter::Print()
          ()
         from /home/lidavidm/miniconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1000
      #2  0x00007ffff42b98a7 in arrow::PrettyPrint(arrow::Schema const&, arrow::PrettyPrintOptions const&, std::string*) ()
         from /home/lidavidm/miniconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1000
      #3  0x00007ffff64f814b in __pyx_pw_7pyarrow_3lib_6Schema_52to_string(_object*, _object*, _object*) ()
      

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              lidavidm David Li
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 10m
                  2h 10m