[ARROW-17252] [R] Intermittent valgrind failure - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 9.0.1, 10.0.0
Component/s: R
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/32535

Description

A number of recent nightly builds have intermittent failures with valgrind, which fails because of possibly leaked memory around an exec plan. This seems related to a change in XXX that separated ExecPlan_prepare() from ExecPlan_run() and added a ExecPlan_read_table() that uses RunWithCapturedR(). The reported leaks vary but include ExecPlans and ExecNodes and fields of those objects.

A failed run: https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=30310&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181&l=24980

Some example output:

==5249== 14,112 (384 direct, 13,728 indirect) bytes in 1 blocks are definitely lost in loss record 1,988 of 3,883
==5249==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==5249==    by 0x10B2902B: std::_Function_handler<arrow::Result<arrow::compute::ExecNode*> (arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&), arrow::compute::internal::RegisterAggregateNode(arrow::compute::ExecFactoryRegistry*)::{lambda(arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&)#1}>::_M_invoke(std::_Any_data const&, arrow::compute::ExecPlan*&&, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >&&, arrow::compute::ExecNodeOptions const&) (exec_plan.h:60)
==5249==    by 0xFA83A0C: std::function<arrow::Result<arrow::compute::ExecNode*> (arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&)>::operator()(arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&) const (std_function.h:622)
==5249== 14,528 (160 direct, 14,368 indirect) bytes in 1 blocks are definitely lost in loss record 1,989 of 3,883
==5249==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==5249==    by 0x10096CB7: arrow::FutureImpl::Make() (future.cc:187)
==5249==    by 0xFCB6F9A: arrow::Future<arrow::internal::Empty>::Make() (future.h:420)
==5249==    by 0x101AE927: ExecPlanImpl (exec_plan.cc:50)
==5249==    by 0x101AE927: arrow::compute::ExecPlan::Make(arrow::compute::ExecContext*, std::shared_ptr<arrow::KeyValueMetadata const>) (exec_plan.cc:355)
==5249==    by 0xFA77BA2: ExecPlan_create(bool) (compute-exec.cpp:45)
==5249==    by 0xF9FAE9F: _arrow_ExecPlan_create (arrowExports.cpp:868)
==5249==    by 0x4953B60: R_doDotCall (dotcode.c:601)
==5249==    by 0x49C2C16: bcEval (eval.c:7682)
==5249==    by 0x499DB95: Rf_eval (eval.c:748)
==5249==    by 0x49A0904: R_execClosure (eval.c:1918)
==5249==    by 0x49A05B7: Rf_applyClosure (eval.c:1844)
==5249==    by 0x49B2122: bcEval (eval.c:7094)
==5249== 
==5249== 36,322 (416 direct, 35,906 indirect) bytes in 1 blocks are definitely lost in loss record 2,929 of 3,883
==5249==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==5249==    by 0x10214F92: arrow::compute::TaskScheduler::Make() (task_util.cc:421)
==5249==    by 0x101AEA6C: ExecPlanImpl (exec_plan.cc:50)
==5249==    by 0x101AEA6C: arrow::compute::ExecPlan::Make(arrow::compute::ExecContext*, std::shared_ptr<arrow::KeyValueMetadata const>) (exec_plan.cc:355)
==5249==    by 0xFA77BA2: ExecPlan_create(bool) (compute-exec.cpp:45)
==5249==    by 0xF9FAE9F: _arrow_ExecPlan_create (arrowExports.cpp:868)
==5249==    by 0x4953B60: R_doDotCall (dotcode.c:601)
==5249==    by 0x49C2C16: bcEval (eval.c:7682)
==5249==    by 0x499DB95: Rf_eval (eval.c:748)
==5249==    by 0x49A0904: R_execClosure (eval.c:1918)
==5249==    by 0x49A05B7: Rf_applyClosure (eval.c:1844)
==5249==    by 0x49B2122: bcEval (eval.c:7094)
==5249==    by 0x499DB95: Rf_eval (eval.c:748)

We also occasionally get leaked Schemas, and in one case a leaked InputType that seemed completely unrelated to the other leaks (~~ARROW-17225~~).

I'm wondering if these have to do with references in lambdas that get passed by reference? Or perhaps a cache issue? There were some instances in previous leaks where the backtrace to the new allocator was different between reported leaks.

Attachments

Issue Links

links to

GitHub Pull Request #13746

GitHub Pull Request #13773

GitHub Pull Request #13779

GitHub Pull Request #13780

Activity

People

Assignee:: Dewey Dunnington

Reporter:: Dewey Dunnington

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 29/Jul/22 12:39

Updated:: 11/Jan/23 11:49

Resolved:: 09/Aug/22 10:58

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

13h