Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10192

[C++][Python] Segfault when converting nested struct array with dictionary field to pandas series

Details

    Description

      Reproducer:

      def test_struct_array_with_dictionary_field_to_pandas():
          ty = pa.struct([
              pa.field('dict', pa.dictionary(pa.int64(), pa.int32())),
          ])
          data = [
              {'dict': -1859762450}
          ]
          arr = pa.array(data, type=ty)
          arr.to_pandas()
      

      Raises SIGSTOP:

      * thread #1, stop reason = signal SIGSTOP
        * frame #0: 0x00007fff6e2b733a libsystem_kernel.dylib`__pthread_kill + 10
          frame #1: 0x00007fff6e373e60 libsystem_pthread.dylib`pthread_kill + 430
          frame #2: 0x00007fff6e1ce93e libsystem_c.dylib`raise + 26
          frame #3: 0x00007fff6e3685fd libsystem_platform.dylib`_sigtramp + 29
          frame #4: 0x000000011517adfd libarrow_python.200.0.0.dylib`arrow::py::ConvertStruct(options=0x00007f84fc5a0230, data=0x00007f84fc59ef18, out_values=0x00007f84fc53d140) at arrow_to_pandas.cc:685:54
          frame #5: 0x000000011514c642 libarrow_python.200.0.0.dylib`arrow::py::ObjectWriterVisitor::Visit(this=0x00007ffee06a1a88, type=0x00007f84fc5a00e8) at arrow_to_pandas.cc:1031:12
          frame #6: 0x00000001151499c4 libarrow_python.200.0.0.dylib`arrow::Status arrow::VisitTypeInline<arrow::py::ObjectWriterVisitor>(type=0x00007f84fc5a00e8, visitor=0x00007ffee06a1a88) at visitor_inline.h:88:5
          frame #7: 0x0000000115149305 libarrow_python.200.0.0.dylib`arrow::py::ObjectWriter::CopyInto(this=0x00007f84fc5a0228, data=std::__1::shared_ptr<arrow::ChunkedArray>::element_type @ 0x00007f84fc59ef18 strong=2 weak=1, rel_placement=0) at arrow_to_pand
      as.cc:1055:12
      
      frame #4: 0x000000011517adfd libarrow_python.200.0.0.dylib`arrow::py::ConvertStruct(options=0x00007f84fc5a0230, data=0x00007f84fc59ef18, out_values=0x00007f84fc53d140) at arrow_to_pandas.cc:685:54
         682            if (!arr->field(static_cast<int>(field_idx))->IsNull(i)) {
         683              // Value exists in child array, obtain it
         684              auto array = reinterpret_cast<PyArrayObject*>(fields_data[field_idx].obj());
      -> 685              auto ptr = reinterpret_cast<const char*>(PyArray_GETPTR1(array, i));
         686              field_value.reset(PyArray_GETITEM(array, ptr));
         687              RETURN_IF_PYERROR();
         688            } else {
      

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              kszucs Krisztian Szucs
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m

                  Slack

                    Issue deployment