Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Backend
    • Labels:
      None

      Description

      Seen when using Hue (i.e. an HS2 client) with FETCH_FIRST:

      #5  <signal handler called>
      #6  0x00000033a5209470 in pthread_mutex_lock () from /lib64/libpthread.so.0
      #7  0x000000000081a608 in boost::mutex::lock() ()
      #8  0x000000000081a68c in boost::unique_lock<boost::mutex>::lock() ()
      #9  0x0000000000cf7d5d in impala::PlanRootSink::GetNext(impala::RuntimeState*, impala::QueryResultSet*, int, bool*) ()
      #10 0x0000000000da2398 in impala::Coordinator::GetNext(impala::QueryResultSet*, int, bool*) ()
      #11 0x0000000000af93c7 in impala::ImpalaServer::QueryExecState::FetchRowsInternal(int, impala::QueryResultSet*) ()
      #12 0x0000000000af96d9 in impala::ImpalaServer::QueryExecState::FetchRows(int, impala::QueryResultSet*) ()
      #13 0x0000000000ae1194 in impala::ImpalaServer::FetchInternal(impala::TUniqueId const&, int, bool, apache::hive::service::cli::thrift::TFetchResultsResp*) ()
      #14 0x0000000000ae1765 in impala::ImpalaServer::FetchResults(apache::hive::service::cli::thrift::TFetchResultsResp&, apache::hive::service::cli::thrift::TFetchResultsReq const&) ()
      #15 0x0000000000d4f54a in apache::hive::service::cli::thrift::TCLIServiceProcessor::process_FetchResults(int, apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, void*) ()
      #16 0x0000000000d4af24 in apache::hive::service::cli::thrift::TCLIServiceProcessor::dispatchCall(apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, std::string const&, int, void*) ()
      

      Here's a trace of the RPC requests made by Hue (most recent is last):

      root: DEBUG: Thrift call: <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.ExecuteStatement(args=(TExecuteStatementReq(confOverlay={'impala.resultset.cache.size': '50000', 'QUERY_TIMEOUT_S': '600'}, sessionHandle=TSessionHandle(sessionId=THandleIdentifier(secret='\r\xe59\xb0@2D\xfd\x8a\xb0z\xc7\x7f\xf8d\x14', guid='\xed\xb8G\x1fqpF\x86\xb68B\xec\x85\xf4\xee\x89')), runAsync=True, statement='SELECT * FROM tweets ORDER BY row_num'),), kwargs={})
      
      root: DEBUG: Thrift call <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.ExecuteStatement returned in 116ms: TExecuteStatementResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3', guid='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3')))
      
      root: DEBUG: Thrift call: <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.GetOperationStatus(args=(TGetOperationStatusReq(operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3', guid='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3'))),), kwargs={})
      
      root: DEBUG: Thrift call <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.GetOperationStatus returned in 65ms: TGetOperationStatusResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), operationState=1, errorMessage=None, sqlState=None, errorCode=None)
      
      root: DEBUG: Thrift call: <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.GetOperationStatus(args=(TGetOperationStatusReq(operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3', guid='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3'))),), kwargs={})
      
      root: DEBUG: Thrift call <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.GetOperationStatus returned in 64ms: TGetOperationStatusResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), operationState=2, errorMessage=None, sqlState=None, errorCode=None)
      
      root: DEBUG: Thrift call: <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.GetLog(args=(TGetLogReq(operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3', guid='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3'))),), kwargs={})
      
      root: DEBUG: Thrift call <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.GetLog returned in 69ms: TGetLogResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), log='Query dc4519afdcc44dee:b324ca9700000000 100% Complete (5 out of 5)\n')
      access: INFO: 127.0.0.1 test - "GET /impala/results/3/0 HTTP/1.1"
      
      root: DEBUG: Thrift call: <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.GetOperationStatus(args=(TGetOperationStatusReq(operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3', guid='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3'))),), kwargs={})
      
      root: DEBUG: Thrift call <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.GetOperationStatus returned in 62ms: TGetOperationStatusResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), operationState=2, errorMessage=None, sqlState=None, errorCode=None)
      
      root: DEBUG: Thrift call: <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.FetchResults(args=(TFetchResultsReq(fetchType=0, operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3', guid='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3')), orientation=4, maxRows=100),), kwargs={})
      
      root: DEBUG: Thrift call <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.FetchResults returned in 72ms: TFetchResultsResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), results=TRowSet(rows=[], columns=[TColumn(i32Val=TI32Column(nulls='\x00', values=[1, 2, 3, 4, 5]), byteVal=None, i16Val=None, i64Val=None, stringVal=None, boolVal=None, doubleVal=None, binaryVal=None), TColumn(i32Val=None, byteVal=None, i16Val=None, i64Val=None, stringVal=TStringColumn(nulls='\x00', values=['531091827395682000', '531091827781550000', '531091827768979000', '531091827114668000', '531091827949309000']), boolVal=None, doubleVal=None, binaryVal=None), TColumn(i32Val=None, byteVal=None, i16Val=None, i64Val=None, stringVal=TStringColumn(nulls='\x00', values=['My dad looks younger than costa', 'There is a thin line between your partner being vengeful and you reaping the consequences of your bad actions towards your partner.', '@Mustang_Sally83 and they need to get into you :))))', '@RachelZJohnson thank you rach!xxx', 'i think @WWERollins was robbed of the IC ti...
      
      root: DEBUG: Thrift call: <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.GetResultSetMetadata(args=(TGetResultSetMetadataReq(operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3', guid='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3'))),), kwargs={})
      
      root: DEBUG: Thrift call <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.GetResultSetMetadata returned in 65ms: TGetResultSetMetadataResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), schema=TTableSchema(columns=[TColumnDesc(comment=None, columnName='row_num', typeDesc=TTypeDesc(types=[TTypeEntry(mapEntry=None, unionEntry=None, arrayEntry=None, userDefinedTypeEntry=None, structEntry=None, primitiveEntry=TPrimitiveTypeEntry(typeQualifiers=None, type=3))]), position=0), TColumnDesc(comment=None, columnName='id_str', typeDesc=TTypeDesc(types=[TTypeEntry(mapEntry=None, unionEntry=None, arrayEntry=None, userDefinedTypeEntry=None, structEntry=None, primitiveEntry=TPrimitiveTypeEntry(typeQualifiers=None, type=7))]), position=1), TColumnDesc(comment=None, columnName='text', typeDesc=TTypeDesc(types=[TTypeEntry(mapEntry=None, unionEntry=None, arrayEntry=None, userDefinedTypeEntry=None, structEntry=None, primitiveEntry=TPrimitiveTypeEntry(typeQualifiers=None, type=7))]), position=2)]))
      
      root: DEBUG: Thrift call: <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.GetLog(args=(TGetLogReq(operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3', guid='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3'))),), kwargs={})
      
      root: DEBUG: Thrift call <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.GetLog returned in 66ms: TGetLogResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), log='Query dc4519afdcc44dee:b324ca9700000000 100% Complete (5 out of 5)\n')
      access: INFO: 127.0.0.1 test - "GET /impala/results/3/0 HTTP/1.1"
      
      root: DEBUG: Thrift call: <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.GetOperationStatus(args=(TGetOperationStatusReq(operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3', guid='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3'))),), kwargs={})
      
      root: DEBUG: Thrift call <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.GetOperationStatus returned in 70ms: TGetOperationStatusResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), operationState=2, errorMessage=None, sqlState=None, errorCode=None)
      
      root: DEBUG: Thrift call: <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.FetchResults(args=(TFetchResultsReq(fetchType=0, operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3', guid='\xeeM\xc4\xdc\xaf\x19E\xdc\x00\x00\x00\x00\x97\xca$\xb3')), orientation=4, maxRows=100),), kwargs={})
      

        Issue Links

          Activity

          Hide
          henryr Henry Robinson added a comment -

          I have found one way that this happens, but it does involve the query getting cancelled.

          lock_.unlock();
          Status status = coord_->GetNext(fetched_rows, max_coord_rows, &eos_);
          lock_.lock();
          

          If a cancellation happens while lock_ is unlocked, the QueryExecState can call Coordinator::TearDown() which will destroy the PlanRootSink ultimately. Then if GetNext() resumes, it can easily hit an error.

          This didn't use to happen because it was safe to cancel the coordinator during a GetNext() call. IMPALA-2905 made it not safe. We should take a second look at the locking in that class, which is kind of funky already, to see how concurrent access should be dealt with. Cancellation should probably quiesce the readers and then tear down.

          Show
          henryr Henry Robinson added a comment - I have found one way that this happens, but it does involve the query getting cancelled. lock_.unlock(); Status status = coord_->GetNext(fetched_rows, max_coord_rows, &eos_); lock_.lock(); If a cancellation happens while lock_ is unlocked, the QueryExecState can call Coordinator::TearDown() which will destroy the PlanRootSink ultimately. Then if GetNext() resumes, it can easily hit an error. This didn't use to happen because it was safe to cancel the coordinator during a GetNext() call. IMPALA-2905 made it not safe. We should take a second look at the locking in that class, which is kind of funky already, to see how concurrent access should be dealt with. Cancellation should probably quiesce the readers and then tear down.
          Hide
          henryr Henry Robinson added a comment -

          This is going to be tricky to fix without IMPALA-4275, which would remove a call to PlanRootSink::CloseConsumer() from GetNext().

          To fix this simply, we would put locks around the concurrent calls to PRS::GetNext() and PRS::CloseConsumer() made from the coordinator. But if GetNext() is taking a long time, it will block CloseConsumer() and so the cancel call will block as well, which we would prefer to avoid.

          It would be better to have CloseConsumer() wake the consumer in GetNext() so that it exits quickly, and quiesces the PlanRootSink so that tear down can continue. However, that means we need to call CloseConsumer() without a shared lock. That's do-able, as long as there's only one place CloseConsumer() can be called, but right now that isn't true because of IMPALA-4275.

          Show
          henryr Henry Robinson added a comment - This is going to be tricky to fix without IMPALA-4275 , which would remove a call to PlanRootSink::CloseConsumer() from GetNext() . To fix this simply, we would put locks around the concurrent calls to PRS::GetNext() and PRS::CloseConsumer() made from the coordinator. But if GetNext() is taking a long time, it will block CloseConsumer() and so the cancel call will block as well, which we would prefer to avoid. It would be better to have CloseConsumer() wake the consumer in GetNext() so that it exits quickly, and quiesces the PlanRootSink so that tear down can continue. However, that means we need to call CloseConsumer() without a shared lock. That's do-able, as long as there's only one place CloseConsumer() can be called, but right now that isn't true because of IMPALA-4275 .
          Show
          henryr Henry Robinson added a comment - Fixed in https://github.com/apache/incubator-impala/commit/3b8a0891d9eebec00fc2654bb4a705752edc6759

            People

            • Assignee:
              henryr Henry Robinson
              Reporter:
              henryr Henry Robinson
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development