Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7328

Errors in HdfsScanner::Open() errors get swallowed up

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • Impala 3.1.0
    • None
    • Backend

    Description

      https://jenkins.impala.io/job/parallel-all-tests/3826/ failed with at test_udfs.py:

      03:50:23 ] FAIL query_test/test_udfs.py::TestUdfExecution::()::test_udf_errors[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'exec_single_node_rows_threshold': 100, 'enable_expr_rewrites': True} | table_format: text/none]
      03:50:23 ] =================================== FAILURES ===================================
      03:50:23 ]  TestUdfExecution.test_udf_errors[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'exec_single_node_rows_threshold': 100, 'enable_expr_rewrites': True} | table_format: text/none] 
      03:50:23 ] [gw13] linux2 -- Python 2.7.12 /home/ubuntu/Impala/bin/../infra/python/env/bin/python
      03:50:23 ] query_test/test_udfs.py:415: in test_udf_errors
      03:50:23 ]     self.run_test_case('QueryTest/udf-errors', vector, use_db=unique_database)
      03:50:23 ] common/impala_test_suite.py:408: in run_test_case
      03:50:23 ]     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
      03:50:23 ] common/impala_test_suite.py:286: in __verify_exceptions
      03:50:23 ]     (expected_str, actual_str)
      03:50:23 ] E   AssertionError: Unexpected exception string. Expected: BadExpr2 prepare error
      03:50:23 ] E   Not found in actual: ImpalaBeeswaxException: Query aborted:Cancelled
      03:50:23 ] ---------------------------- Captured stderr setup -----------------------------
      

      Digging through the log, the query which triggered the failure is 774db10632a21589:5a62e77200000000

      It appears that the error which this test intends to fault at isn't shown at the coordinator:

      ExecState: query id=774db10632a21589:5a62e77200000000 finstance=774db10632a21589:5a62e77200000003 on host=ip-172-31-0-127:22001 (EXECUTING -> ERROR) status=Cancelled
      

      In particular, the test aims to trigger a failure in HdfsScanner::Open() when scalar expr evaluator is cloned:

      // This prepare function always fails for cloned evaluators to exercise IMPALA-6184.
      // It does so by detecting whether the caller is a cloned evaluator and inserts an error
      // in FunctionContext if that's the case.
      void BadExpr2Prepare(FunctionContext* context,
          FunctionContext::FunctionStateScope scope) {
        if (scope == FunctionContext::FRAGMENT_LOCAL) {
          int32_t* state = reinterpret_cast<int32_t*>(context->Allocate(sizeof(int32_t)));
          *state = 0xf001cafe;
          context->SetFunctionState(scope, state);
          // Set the thread local state too to differentiate from cloned evaluators.
          context->SetFunctionState(FunctionContext::THREAD_LOCAL, state);
        } else {
          if (context->GetFunctionState(FunctionContext::THREAD_LOCAL) == nullptr) {
            context->SetError("BadExpr2 prepare error");
          }
        }
      }
      

      However, for some reasons, the actual failure to be propagated and instead the cancellation status was propagated instead. Staring at the code in HdfsScanNode, it's not immediately clear where the race is.

      For the reference, the following is the expected error message:

      ExecState: query id=64404101d8857592:173298a700000000 finstance=64404101d8857592:173298a700000002 on host=ip-172-31-0-127:22002 (EXECUTING -> ERROR) status=BadExpr2 prepare error
      

      Attachments

        1. IMPALA-7328.tar.gz
          28.26 MB
          Michael Ho

        Issue Links

          Activity

            People

              sailesh Sailesh Mukil
              kwho Michael Ho
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: