Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4595

Linking IR UDF module to main module crashes Impala

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.7.0, Impala 2.8.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Backend
    • Labels:

      Description

      Steps to reproduce:

      1. Compile the code from https://github.com/cloudera/impala-udf-samples to udf-samples.ll
      2. Create a UDF using functions defined in the LLVM IR code.
      3. Invoke the newly created UDF and Impala crashes with the following stack trace.

      (gdb) bt
      #0  0x00007f36d166ccc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
      #1  0x00007f36d16700d8 in __GI_abort () at abort.c:89
      #2  0x0000000002821c04 in google::DumpStackTraceAndExit() ()
      #3  0x000000000281b06d in google::LogMessage::Fail() ()
      #4  0x000000000281d996 in google::LogMessage::SendToLog() ()
      #5  0x000000000281ab8d in google::LogMessage::Flush() ()
      #6  0x000000000281e43e in google::LogMessageFatal::~LogMessageFatal() ()
      #7  0x0000000001627a46 in impala::LlvmCodeGen::LinkModule (this=0x883ff80, file=...) at /kwho-desktop-1/trees/incubator-impala/be/src/codegen/llvm-codegen.cc:368
      #8  0x00000000018cb92a in impala::ScalarFnCall::GetCodegendComputeFn (this=0x8682600, codegen=0x883ff80, fn=0x7f369e211a78) at /kwho-desktop-1/trees/incubator-impala/be/src/exprs/scalar-fn-call.cc:314
      #9  0x0000000001367f25 in impala::RuntimeState::CodegenScalarFns (this=0x7f369e211eb0) at /kwho-desktop-1/trees/incubator-impala/be/src/runtime/runtime-state.cc:197
      #10 0x000000000146c3f4 in Java_org_apache_impala_service_FeSupport_NativeEvalConstExprs (env=0x73fb1e8, caller_class=0x7f369e212af0, thrift_expr_batch=0x7f369e212b08, thrift_query_ctx_bytes=0x7f369e212b00)
          at /kwho-desktop-1/trees/incubator-impala/be/src/service/fe-support.cc:133
      #11 0x00007f36cb49bc39 in ?? ()
      #12 0x00000000073fb000 in ?? ()
      #13 0x000000000000023f in ?? ()
      #14 0x00007f369e212aa0 in ?? ()
      #15 0x0000000606b3ab20 in ?? ()
      #16 0x00007f369e212b08 in ?? ()
      #17 0x0000000606b3c298 in ?? ()
      #18 0x0000000000000000 in ?? ()
      (gdb) f 7
      #7  0x0000000001627a46 in impala::LlvmCodeGen::LinkModule (this=0x883ff80, file=...) at /kwho-desktop-1/trees/incubator-impala/be/src/codegen/llvm-codegen.cc:368
      368         DCHECK(fn != NULL);
      (gdb) l
      363       // Parse materialized functions in the source module and materialize functions it
      364       // references. Do it after linking so LLVM has "merged" functions defined in both
      365       // modules.
      366       for (const string& fn_name: materializable_fns) {
      367         Function* fn = module_->getFunction(fn_name);
      368         DCHECK(fn != NULL);
      369         if (!fn->isMaterializable()) MaterializeCallees(fn);
      370       }
      371       return Status::OK();
      372     }
      
      (gdb) p fn_name
      $1 = (const std::string &) @0x8d95930: {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
          _M_p = 0x84fbcd8 "_ZN10impala_udf9StringValC2EPhi"}}
      

      The crash has to do with being unable to look up a function using its name in the linked module. The function's name was recorded by parsing the external LLVM module before it's linked to the 'main_module_' of Impala. Apparently, the name of the linked function may have been renamed during linking and it could no longer be found after linking.

      cc'ing Tim Armstrong for his opinion.

        Activity

        Hide
        tarmstrong Tim Armstrong added a comment -

        Do we know that it was renamed? Isn't it possible that some (e.g. private) functions from the loaded module aren't required so are not linked?

        Show
        tarmstrong Tim Armstrong added a comment - Do we know that it was renamed? Isn't it possible that some (e.g. private) functions from the loaded module aren't required so are not linked?
        Hide
        kwho Michael Ho added a comment -

        The missing function is defined only in the external module so that's also a possibility.

        Show
        kwho Michael Ho added a comment - The missing function is defined only in the external module so that's also a possibility.
        Hide
        kwho Michael Ho added a comment -

        I think Tim Armstrong is right. udf-sample.cc was compiled without any optimization. Duplicated definition such as StringVal::null() may have different inlining level between the external module (udf-sample.ll) and the main module. When the duplicated definition in the external module is discarded, some of its callee functions (which are not inlined) may not be defined in the main module so they can no longer be located in the linked module. This trips up some code in the LlvmCodegen::LinkModule(). In particular, when parsing for functions in external module which are materialized during linking, certain functions may not be present due to the reason above. Impalad will hit a DCHECK in debug build or crash due to null pointer access in release build.

        In this case, it appears _ZN10impala_udf9StringValC2EPhi became dead code after linking.

        Definition of StringVal::null() in udf-sample.ll

        ; Function Attrs: uwtable
        define linkonce_odr { i64, i8* } @_ZN10impala_udf9StringVal4nullEv() #0 comdat align 2 {
          %1 = alloca %"struct.impala_udf::StringVal", align 8
          call void @_ZN10impala_udf9StringValC2EPhi(%"struct.impala_udf::StringVal"* %1, i8* null, i32 0)
          %2 = bitcast %"struct.impala_udf::StringVal"* %1 to %"struct.impala_udf::AnyVal"*
          %3 = getelementptr inbounds %"struct.impala_udf::AnyVal", %"struct.impala_udf::AnyVal"* %2, i32 0, i32 0
          store i8 1, i8* %3, align 1
          %4 = bitcast %"struct.impala_udf::StringVal"* %1 to { i64, i8* }*
          %5 = load { i64, i8* }, { i64, i8* }* %4, align 1
          ret { i64, i8* } %5
        }
        
        ; Function Attrs: uwtable
        define linkonce_odr void @_ZN10impala_udf9StringValC2EPhi(%"struct.impala_udf::StringVal"* %this, i8* %ptr, i32 %len) unnamed_addr #0 comdat align 2 {
          %1 = alloca %"struct.impala_udf::StringVal"*, align 8
          %2 = alloca i8*, align 8
          %3 = alloca i32, align 4
          store %"struct.impala_udf::StringVal"* %this, %"struct.impala_udf::StringVal"** %1, align 8
          store i8* %ptr, i8** %2, align 8
          store i32 %len, i32* %3, align 4
          %4 = load %"struct.impala_udf::StringVal"*, %"struct.impala_udf::StringVal"** %1
          %5 = bitcast %"struct.impala_udf::StringVal"* %4 to %"struct.impala_udf::AnyVal"*
          call void @_ZN10impala_udf6AnyValC2Eb(%"struct.impala_udf::AnyVal"* %5, i1 zeroext false)
          %6 = getelementptr inbounds %"struct.impala_udf::StringVal", %"struct.impala_udf::StringVal"* %4, i32 0, i32 1
          %7 = load i32, i32* %3, align 4
          store i32 %7, i32* %6, align 4
          %8 = getelementptr inbounds %"struct.impala_udf::StringVal", %"struct.impala_udf::StringVal"* %4, i32 0, i32 2
          %9 = load i8*, i8** %2, align 8
          store i8* %9, i8** %8, align 8
          %10 = load i32, i32* %3, align 4
          %11 = icmp sge i32 %10, 0
          br i1 %11, label %12, label %13
        
        ; <label>:12                                      ; preds = %0
          br label %15
        
        ; <label>:13                                      ; preds = %0
          call void @__assert_fail(i8* getelementptr inbounds ([9 x i8], [9 x i8]* @.str2, i32 0, i32 0), i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str3, i32 0, i32 0), i32 535, i8* getelementptr inbounds ([49 x i8], [49 x i8]* @__PRETTY_FUNCTION__._ZN10impala_udf9StringValC2EPhi, i32 0, i32 0)) #10
          unreachable
                                                          ; No predecessors!
          br label %15
        
        ; <label>:15                                      ; preds = %14, %12
          ret void
        }
        

        Definition of StringVal::null() in impala-sse.ll (main module):

        define linkonce_odr { i64, i8* } @_ZN10impala_udf9StringVal4nullEv() #7 comdat align 2 {
          %1 = alloca %"struct.impala_udf::StringVal", align 8
          %2 = getelementptr inbounds %"struct.impala_udf::StringVal", %"struct.impala_udf::StringVal"* %1, i64 0, i32 0
          %3 = getelementptr inbounds %"struct.impala_udf::AnyVal", %"struct.impala_udf::AnyVal"* %2, i64 0, i32 0
          store i8 0, i8* %3, align 1, !tbaa !1
          %4 = getelementptr inbounds %"struct.impala_udf::StringVal", %"struct.impala_udf::StringVal"* %1, i64 0, i32 1
          store i32 0, i32* %4, align 4, !tbaa !7
          %5 = getelementptr inbounds %"struct.impala_udf::StringVal", %"struct.impala_udf::StringVal"* %1, i64 0, i32 2
          store i8* null, i8** %5, align 8, !tbaa !11
          %6 = getelementptr inbounds %"struct.impala_udf::StringVal", %"struct.impala_udf::StringVal"* %1, i64 0, i32 0, i32 0
          store i8 1, i8* %6, align 8, !tbaa !1
          %7 = bitcast %"struct.impala_udf::StringVal"* %1 to i64*
          %8 = load i64, i64* %7, align 8
          %9 = insertvalue { i64, i8* } undef, i64 %8, 0
          %10 = getelementptr inbounds %"struct.impala_udf::StringVal", %"struct.impala_udf::StringVal"* %1, i64 0, i32 2
          %11 = load i8*, i8** %10, align 8
          %12 = insertvalue { i64, i8* } %9, i8* %11, 1
          ret { i64, i8* } %12
        }
        
        Show
        kwho Michael Ho added a comment - I think Tim Armstrong is right. udf-sample.cc was compiled without any optimization. Duplicated definition such as StringVal::null() may have different inlining level between the external module (udf-sample.ll) and the main module. When the duplicated definition in the external module is discarded, some of its callee functions (which are not inlined) may not be defined in the main module so they can no longer be located in the linked module. This trips up some code in the LlvmCodegen::LinkModule(). In particular, when parsing for functions in external module which are materialized during linking, certain functions may not be present due to the reason above. Impalad will hit a DCHECK in debug build or crash due to null pointer access in release build. In this case, it appears _ZN10impala_udf9StringValC2EPhi became dead code after linking. Definition of StringVal::null() in udf-sample.ll ; Function Attrs: uwtable define linkonce_odr { i64, i8* } @_ZN10impala_udf9StringVal4nullEv() #0 comdat align 2 { %1 = alloca %"struct.impala_udf::StringVal", align 8 call void @_ZN10impala_udf9StringValC2EPhi(%"struct.impala_udf::StringVal"* %1, i8* null, i32 0) %2 = bitcast %"struct.impala_udf::StringVal"* %1 to %"struct.impala_udf::AnyVal"* %3 = getelementptr inbounds %"struct.impala_udf::AnyVal", %"struct.impala_udf::AnyVal"* %2, i32 0, i32 0 store i8 1, i8* %3, align 1 %4 = bitcast %"struct.impala_udf::StringVal"* %1 to { i64, i8* }* %5 = load { i64, i8* }, { i64, i8* }* %4, align 1 ret { i64, i8* } %5 } ; Function Attrs: uwtable define linkonce_odr void @_ZN10impala_udf9StringValC2EPhi(%"struct.impala_udf::StringVal"* %this, i8* %ptr, i32 %len) unnamed_addr #0 comdat align 2 { %1 = alloca %"struct.impala_udf::StringVal"*, align 8 %2 = alloca i8*, align 8 %3 = alloca i32, align 4 store %"struct.impala_udf::StringVal"* %this, %"struct.impala_udf::StringVal"** %1, align 8 store i8* %ptr, i8** %2, align 8 store i32 %len, i32* %3, align 4 %4 = load %"struct.impala_udf::StringVal"*, %"struct.impala_udf::StringVal"** %1 %5 = bitcast %"struct.impala_udf::StringVal"* %4 to %"struct.impala_udf::AnyVal"* call void @_ZN10impala_udf6AnyValC2Eb(%"struct.impala_udf::AnyVal"* %5, i1 zeroext false) %6 = getelementptr inbounds %"struct.impala_udf::StringVal", %"struct.impala_udf::StringVal"* %4, i32 0, i32 1 %7 = load i32, i32* %3, align 4 store i32 %7, i32* %6, align 4 %8 = getelementptr inbounds %"struct.impala_udf::StringVal", %"struct.impala_udf::StringVal"* %4, i32 0, i32 2 %9 = load i8*, i8** %2, align 8 store i8* %9, i8** %8, align 8 %10 = load i32, i32* %3, align 4 %11 = icmp sge i32 %10, 0 br i1 %11, label %12, label %13 ; <label>:12 ; preds = %0 br label %15 ; <label>:13 ; preds = %0 call void @__assert_fail(i8* getelementptr inbounds ([9 x i8], [9 x i8]* @.str2, i32 0, i32 0), i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str3, i32 0, i32 0), i32 535, i8* getelementptr inbounds ([49 x i8], [49 x i8]* @__PRETTY_FUNCTION__._ZN10impala_udf9StringValC2EPhi, i32 0, i32 0)) #10 unreachable ; No predecessors! br label %15 ; <label>:15 ; preds = %14, %12 ret void } Definition of StringVal::null() in impala-sse.ll (main module): define linkonce_odr { i64, i8* } @_ZN10impala_udf9StringVal4nullEv() #7 comdat align 2 { %1 = alloca %"struct.impala_udf::StringVal", align 8 %2 = getelementptr inbounds %"struct.impala_udf::StringVal", %"struct.impala_udf::StringVal"* %1, i64 0, i32 0 %3 = getelementptr inbounds %"struct.impala_udf::AnyVal", %"struct.impala_udf::AnyVal"* %2, i64 0, i32 0 store i8 0, i8* %3, align 1, !tbaa !1 %4 = getelementptr inbounds %"struct.impala_udf::StringVal", %"struct.impala_udf::StringVal"* %1, i64 0, i32 1 store i32 0, i32* %4, align 4, !tbaa !7 %5 = getelementptr inbounds %"struct.impala_udf::StringVal", %"struct.impala_udf::StringVal"* %1, i64 0, i32 2 store i8* null, i8** %5, align 8, !tbaa !11 %6 = getelementptr inbounds %"struct.impala_udf::StringVal", %"struct.impala_udf::StringVal"* %1, i64 0, i32 0, i32 0 store i8 1, i8* %6, align 8, !tbaa !1 %7 = bitcast %"struct.impala_udf::StringVal"* %1 to i64* %8 = load i64, i64* %7, align 8 %9 = insertvalue { i64, i8* } undef, i64 %8, 0 %10 = getelementptr inbounds %"struct.impala_udf::StringVal", %"struct.impala_udf::StringVal"* %1, i64 0, i32 2 %11 = load i8*, i8** %10, align 8 %12 = insertvalue { i64, i8* } %9, i8* %11, 1 ret { i64, i8* } %12 }
        Hide
        kwho Michael Ho added a comment -

        https://github.com/apache/incubator-impala/commit/9337518137ddbbdefde9738ea1dce9f978f525a8

        IMPALA-4595: Ignore discarded functions after linking
        For LLVM IR UDF, Impalad will link an external LLVM module
        in which the IR UDF is defined with the main module. If it
        happens that a symbol is defined in both modules, LLVM may
        choose to discard the one defined in the external module.
        The discarded function and its callee will not be present
        in the linked module.

        In IMPALA-4595, udf-sample.cc was compiled without any
        optimization. Duplicated definition such as StringVal::null()
        may have different inlining level between the external module
        and the main module. When the duplicated definition in
        the external module is discarded, some of its callee
        functions (which are not inlined) may not be defined in the
        main module so they can no longer be located in the linked
        module. This trips up some code in the LlvmCodegen::LinkModule().
        In particular, when parsing for functions in external module
        which are materialized during linking, certain functions may
        not be present due to the reason above. Impalad will hit
        a DCHECK in debug build or crash due to null pointer access
        in release build.

        This change fixes the problem above by taking into account
        that certain functions may not be defined anymore after linking.
        This change also fixes two incorrect status propagation in
        fe-support.cc.

        Change-Id: Iaa056a0c888bfcc95b412e1bc1063bb607b58ab7
        Reviewed-on: http://gerrit.cloudera.org:8080/5384
        Reviewed-by: Michael Ho <kwho@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        kwho Michael Ho added a comment - https://github.com/apache/incubator-impala/commit/9337518137ddbbdefde9738ea1dce9f978f525a8 IMPALA-4595 : Ignore discarded functions after linking For LLVM IR UDF, Impalad will link an external LLVM module in which the IR UDF is defined with the main module. If it happens that a symbol is defined in both modules, LLVM may choose to discard the one defined in the external module. The discarded function and its callee will not be present in the linked module. In IMPALA-4595 , udf-sample.cc was compiled without any optimization. Duplicated definition such as StringVal::null() may have different inlining level between the external module and the main module. When the duplicated definition in the external module is discarded, some of its callee functions (which are not inlined) may not be defined in the main module so they can no longer be located in the linked module. This trips up some code in the LlvmCodegen::LinkModule(). In particular, when parsing for functions in external module which are materialized during linking, certain functions may not be present due to the reason above. Impalad will hit a DCHECK in debug build or crash due to null pointer access in release build. This change fixes the problem above by taking into account that certain functions may not be defined anymore after linking. This change also fixes two incorrect status propagation in fe-support.cc. Change-Id: Iaa056a0c888bfcc95b412e1bc1063bb607b58ab7 Reviewed-on: http://gerrit.cloudera.org:8080/5384 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins
        Hide
        jrussell John Russell added a comment -

        Added new "known issue" item in this gerrit review: https://gerrit.cloudera.org/#/c/5809/

        Show
        jrussell John Russell added a comment - Added new "known issue" item in this gerrit review: https://gerrit.cloudera.org/#/c/5809/

          People

          • Assignee:
            kwho Michael Ho
            Reporter:
            kwho Michael Ho
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development