Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9380

[C++] Segfaults in compute::CallFunction

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.0.0
    • C++

    Description

      I triggered these from R, so that's what the reproducers are in.

      1. Calling "filter" with no args segfaults.

      arrow:::compute__CallFunction("filter", list(), list(keep_na = FALSE))
      

      Top of the backtrace from lldb:

        * frame #0: 0x0000000109e1c2c7 libarrow.100.dylib`arrow::Datum::type() const + 7
          frame #1: 0x000000010a14a232 libarrow.100.dylib`arrow::compute::internal::(anonymous namespace)::FilterMetaFunction::ExecuteImpl(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const + 66
          frame #2: 0x0000000109fc32c9 libarrow.100.dylib`arrow::compute::MetaFunction::Execute(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const + 41
          frame #3: 0x0000000109fb3d3c libarrow.100.dylib`arrow::compute::CallFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) + 844
          frame #4: 0x0000000109fb3c47 libarrow.100.dylib`arrow::compute::CallFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) + 599
      

      This is not the case with at least some other functions. If I try to call "sum" with no args, I get Invalid: Function accepts 1 arguments but passed 0 and no segfault.

      2. Something is strange with is_null. It creates what appears to be a valid boolean array, but if I pass it to filter, it segfaults. I'm adding bindings for this in ARROW-9187 but this should run on current master:

      library(arrow)
      a <- Array$create(1:4)
      b <- arrow:::shared_ptr(Array, arrow:::call_function("is_null", a))
      a$Filter(b)
      

      Backtrace:

       * frame #0: 0x000000010a120bb6 libarrow.100.dylib`arrow::compute::internal::GetFilterOutputSize(arrow::ArrayData const&, arrow::compute::FilterOptions::NullSelectionBehavior) + 38
          frame #1: 0x000000010a125659 libarrow.100.dylib`arrow::compute::internal::(anonymous namespace)::PrimitiveFilter(arrow::compute::KernelContext*, arrow::compute::ExecBatch const&, arrow::Datum*) + 121
          frame #2: 0x0000000109fbbea4 libarrow.100.dylib`arrow::compute::detail::VectorExecutor::ExecuteBatch(arrow::compute::ExecBatch const&, arrow::compute::detail::ExecListener*) + 996
          frame #3: 0x0000000109fba3e6 libarrow.100.dylib`arrow::compute::detail::VectorExecutor::Execute(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::detail::ExecListener*) + 150
          frame #4: 0x0000000109fc0948 libarrow.100.dylib`arrow::compute::Function::Execute(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const + 1016
          frame #5: 0x0000000109fb3d3c libarrow.100.dylib`arrow::compute::CallFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) + 844
          frame #6: 0x000000010a14a9b5 libarrow.100.dylib`arrow::compute::internal::(anonymous namespace)::FilterMetaFunction::ExecuteImpl(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const + 1989
          frame #7: 0x0000000109fc32c9 libarrow.100.dylib`arrow::compute::MetaFunction::Execute(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const + 41
          frame #8: 0x0000000109fb3d3c libarrow.100.dylib`arrow::compute::CallFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) + 844
          frame #9: 0x0000000109fb3c47 libarrow.100.dylib`arrow::compute::CallFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) + 599
      

      BUT: if I call as.vector on b before using it as a Filter, it works--even though I've discarded the as.vector result and am still using the Array to filter.

      library(arrow)
      a <- Array$create(1:4)
      b <- arrow:::shared_ptr(Array, arrow:::call_function("is_null", a))
      as.vector(b)
      a$Filter(b)
      

      Just printing (calling ToString) on b doesn't prevent the segfault. And I have not observed this with other boolean kernels. E.g. this does not segfault:

      library(arrow)
      a <- Array$create(1:4)
      b <- arrow:::shared_ptr(Array, arrow:::call_function("greater", a, Scalar$create(3L)))
      a$Filter(b)
      

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              npr Neal Richardson
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m