Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17252

[R] Intermittent valgrind failure

    XMLWordPrintableJSON

Details

    Description

      A number of recent nightly builds have intermittent failures with valgrind, which fails because of possibly leaked memory around an exec plan. This seems related to a change in XXX that separated ExecPlan_prepare() from ExecPlan_run() and added a ExecPlan_read_table() that uses RunWithCapturedR(). The reported leaks vary but include ExecPlans and ExecNodes and fields of those objects.

      A failed run: https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=30310&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181&l=24980

      Some example output:

      ==5249== 14,112 (384 direct, 13,728 indirect) bytes in 1 blocks are definitely lost in loss record 1,988 of 3,883
      ==5249==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
      ==5249==    by 0x10B2902B: std::_Function_handler<arrow::Result<arrow::compute::ExecNode*> (arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&), arrow::compute::internal::RegisterAggregateNode(arrow::compute::ExecFactoryRegistry*)::{lambda(arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&)#1}>::_M_invoke(std::_Any_data const&, arrow::compute::ExecPlan*&&, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >&&, arrow::compute::ExecNodeOptions const&) (exec_plan.h:60)
      ==5249==    by 0xFA83A0C: std::function<arrow::Result<arrow::compute::ExecNode*> (arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&)>::operator()(arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&) const (std_function.h:622)
      ==5249== 14,528 (160 direct, 14,368 indirect) bytes in 1 blocks are definitely lost in loss record 1,989 of 3,883
      ==5249==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
      ==5249==    by 0x10096CB7: arrow::FutureImpl::Make() (future.cc:187)
      ==5249==    by 0xFCB6F9A: arrow::Future<arrow::internal::Empty>::Make() (future.h:420)
      ==5249==    by 0x101AE927: ExecPlanImpl (exec_plan.cc:50)
      ==5249==    by 0x101AE927: arrow::compute::ExecPlan::Make(arrow::compute::ExecContext*, std::shared_ptr<arrow::KeyValueMetadata const>) (exec_plan.cc:355)
      ==5249==    by 0xFA77BA2: ExecPlan_create(bool) (compute-exec.cpp:45)
      ==5249==    by 0xF9FAE9F: _arrow_ExecPlan_create (arrowExports.cpp:868)
      ==5249==    by 0x4953B60: R_doDotCall (dotcode.c:601)
      ==5249==    by 0x49C2C16: bcEval (eval.c:7682)
      ==5249==    by 0x499DB95: Rf_eval (eval.c:748)
      ==5249==    by 0x49A0904: R_execClosure (eval.c:1918)
      ==5249==    by 0x49A05B7: Rf_applyClosure (eval.c:1844)
      ==5249==    by 0x49B2122: bcEval (eval.c:7094)
      ==5249== 
      ==5249== 36,322 (416 direct, 35,906 indirect) bytes in 1 blocks are definitely lost in loss record 2,929 of 3,883
      ==5249==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
      ==5249==    by 0x10214F92: arrow::compute::TaskScheduler::Make() (task_util.cc:421)
      ==5249==    by 0x101AEA6C: ExecPlanImpl (exec_plan.cc:50)
      ==5249==    by 0x101AEA6C: arrow::compute::ExecPlan::Make(arrow::compute::ExecContext*, std::shared_ptr<arrow::KeyValueMetadata const>) (exec_plan.cc:355)
      ==5249==    by 0xFA77BA2: ExecPlan_create(bool) (compute-exec.cpp:45)
      ==5249==    by 0xF9FAE9F: _arrow_ExecPlan_create (arrowExports.cpp:868)
      ==5249==    by 0x4953B60: R_doDotCall (dotcode.c:601)
      ==5249==    by 0x49C2C16: bcEval (eval.c:7682)
      ==5249==    by 0x499DB95: Rf_eval (eval.c:748)
      ==5249==    by 0x49A0904: R_execClosure (eval.c:1918)
      ==5249==    by 0x49A05B7: Rf_applyClosure (eval.c:1844)
      ==5249==    by 0x49B2122: bcEval (eval.c:7094)
      ==5249==    by 0x499DB95: Rf_eval (eval.c:748)
      

      We also occasionally get leaked Schemas, and in one case a leaked InputType that seemed completely unrelated to the other leaks (ARROW-17225).

      I'm wondering if these have to do with references in lambdas that get passed by reference? Or perhaps a cache issue? There were some instances in previous leaks where the backtrace to the new allocator was different between reported leaks.

      Attachments

        Activity

          People

            paleolimbot Dewey Dunnington
            paleolimbot Dewey Dunnington
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 13h
                13h