Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
When a drillbit goes down, there are two conditions under which the client is not propagating the error back to the application -
1) The application is in a submitQuery call: the ODBC driver is expecting that the error be reported thru the query results listener which hasn't been registered at the point the error is encountered.
2) A submitQuery call succeeded but never reached the drillbit because it was shutdown. In this case the application has a handle to a query and is listening for results which will never arrive. The heartbeat mechanism detects the failure, but is not propagating the error to the query results listener.
Attachments
Issue Links
- links to
Activity
Github user sudheeshkatkam commented on a diff in the pull request:
https://github.com/apache/drill/pull/493#discussion_r69653159
— Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp —
@@ -469,34 +471,50 @@ DrillClientQueryResult* DrillClientImpl::SubmitQuery(::exec::shared::QueryType t
uint64_t coordId;
DrillClientQueryResult* pQuery=NULL;
+ connectionStatus_t cStatus=CONN_SUCCESS;
{
boost::lock_guard<boost::mutex> prLock(this->m_prMutex);
boost::lock_guard<boost::mutex> dcLock(this->m_dcMutex);
coordId = this->getNextCoordinationId();
OutBoundRpcMessage out_msg(exec::rpc::REQUEST, exec::user::RUN_QUERY, coordId, &query);
- sendSync(out_msg);
+ // Create the result object and register the listener before we send the query
+ // because sometimes the caller is not checking the status of the submitQuery call.
+ // This way, the broadcast error call will cause the results listener to be called
+ // with a COMM_ERROR status.
pQuery = new DrillClientQueryResult(this, coordId, plan);
pQuery->registerListener(l, lCtx);
- bool sendRequest=false;
this->m_queryIds[coordId]=pQuery;
- DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query request. " << "[" << m_connectedHost << "]" << "Coordination id = " << coordId << std::endl
- DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query " << "Coordination id = " << coordId << " query: " << plan << std::endl
+ connectionStatus_t cStatus=sendSync(out_msg);
+ if(cStatus == CONN_SUCCESS){
+ bool sendRequest=false;
- if(m_pendingRequests++==0)
{
- sendRequest=true;
- }
else
{ - DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Queueing query request to server" << std::endl;) - DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Number of pending requests = " << m_pendingRequests << std::endl;) - } - if(sendRequest){
- DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sending query request. Number of pending requests = "
- << m_pendingRequests << std::endl
- getNextResult(); // async wait for results
+ DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query request. " << "[" << m_connectedHost << "]" << "Coordination id = " << coordId << std::endl
+ DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query " << "Coordination id = " << coordId << " query: " << plan << std::endl
+
+ if(m_pendingRequests++==0) { + sendRequest=true; + }else
{ + DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Queueing query request to server" << std::endl;) + DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Number of pending requests = " << m_pendingRequests << std::endl;) + }+ if(sendRequest)
{ + DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sending query request. Number of pending requests = " + << m_pendingRequests << std::endl;) + getNextResult(); // async wait for results + }}
+
+ }
+ if(cStatus!=CONN_SUCCESS){-
- End diff –
-
This block fails only this query. How are the other queries notified? Heartbeat failure?
Github user sudheeshkatkam commented on a diff in the pull request:
https://github.com/apache/drill/pull/493#discussion_r69653162
— Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp —
@@ -469,34 +471,50 @@ DrillClientQueryResult* DrillClientImpl::SubmitQuery(::exec::shared::QueryType t
uint64_t coordId;
DrillClientQueryResult* pQuery=NULL;
+ connectionStatus_t cStatus=CONN_SUCCESS;
{
boost::lock_guard<boost::mutex> prLock(this->m_prMutex);
boost::lock_guard<boost::mutex> dcLock(this->m_dcMutex);
coordId = this->getNextCoordinationId();
OutBoundRpcMessage out_msg(exec::rpc::REQUEST, exec::user::RUN_QUERY, coordId, &query);
- sendSync(out_msg);
+ // Create the result object and register the listener before we send the query
+ // because sometimes the caller is not checking the status of the submitQuery call.
+ // This way, the broadcast error call will cause the results listener to be called
+ // with a COMM_ERROR status.
pQuery = new DrillClientQueryResult(this, coordId, plan);
pQuery->registerListener(l, lCtx);
- bool sendRequest=false;
this->m_queryIds[coordId]=pQuery;
- DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query request. " << "[" << m_connectedHost << "]" << "Coordination id = " << coordId << std::endl
- DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query " << "Coordination id = " << coordId << " query: " << plan << std::endl
+ connectionStatus_t cStatus=sendSync(out_msg);
+ if(cStatus == CONN_SUCCESS){
+ bool sendRequest=false;
- if(m_pendingRequests++==0)
{
- sendRequest=true;
- }
else
{ - DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Queueing query request to server" << std::endl;) - DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Number of pending requests = " << m_pendingRequests << std::endl;) - } - if(sendRequest){
- DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sending query request. Number of pending requests = "
- << m_pendingRequests << std::endl
- getNextResult(); // async wait for results
+ DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query request. " << "[" << m_connectedHost << "]" << "Coordination id = " << coordId << std::endl
+ DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query " << "Coordination id = " << coordId << " query: " << plan << std::endl-
- End diff –
-
Decrease indent level from 494-501.
Github user parthchandra commented on a diff in the pull request:
https://github.com/apache/drill/pull/493#discussion_r69666212
— Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp —
@@ -469,34 +471,50 @@ DrillClientQueryResult* DrillClientImpl::SubmitQuery(::exec::shared::QueryType t
uint64_t coordId;
DrillClientQueryResult* pQuery=NULL;
+ connectionStatus_t cStatus=CONN_SUCCESS;
{
boost::lock_guard<boost::mutex> prLock(this->m_prMutex);
boost::lock_guard<boost::mutex> dcLock(this->m_dcMutex);
coordId = this->getNextCoordinationId();
OutBoundRpcMessage out_msg(exec::rpc::REQUEST, exec::user::RUN_QUERY, coordId, &query);
- sendSync(out_msg);
+ // Create the result object and register the listener before we send the query
+ // because sometimes the caller is not checking the status of the submitQuery call.
+ // This way, the broadcast error call will cause the results listener to be called
+ // with a COMM_ERROR status.
pQuery = new DrillClientQueryResult(this, coordId, plan);
pQuery->registerListener(l, lCtx);
- bool sendRequest=false;
this->m_queryIds[coordId]=pQuery;
- DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query request. " << "[" << m_connectedHost << "]" << "Coordination id = " << coordId << std::endl
- DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query " << "Coordination id = " << coordId << " query: " << plan << std::endl
+ connectionStatus_t cStatus=sendSync(out_msg);
+ if(cStatus == CONN_SUCCESS){
+ bool sendRequest=false;
- if(m_pendingRequests++==0)
{
- sendRequest=true;
- }
else
{ - DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Queueing query request to server" << std::endl;) - DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Number of pending requests = " << m_pendingRequests << std::endl;) - } - if(sendRequest){
- DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sending query request. Number of pending requests = "
- << m_pendingRequests << std::endl
- getNextResult(); // async wait for results
+ DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query request. " << "[" << m_connectedHost << "]" << "Coordination id = " << coordId << std::endl
+ DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query " << "Coordination id = " << coordId << " query: " << plan << std::endl
+
+ if(m_pendingRequests++==0) { + sendRequest=true; + }else
{ + DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Queueing query request to server" << std::endl;) + DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Number of pending requests = " << m_pendingRequests << std::endl;) + }+ if(sendRequest)
{ + DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sending query request. Number of pending requests = " + << m_pendingRequests << std::endl;) + getNextResult(); // async wait for results + }}
+
+ }
+ if(cStatus!=CONN_SUCCESS){-
- End diff –
-
No. If there is a failure in sending, then the sendSync calls handleConnError which fails all the other pending queries (it call broadcastError).
The block above simple removes the queryId for this query from the list of queries currently executing.
Github user parthchandra commented on a diff in the pull request:
https://github.com/apache/drill/pull/493#discussion_r69666595
— Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp —
@@ -469,34 +471,50 @@ DrillClientQueryResult* DrillClientImpl::SubmitQuery(::exec::shared::QueryType t
uint64_t coordId;
DrillClientQueryResult* pQuery=NULL;
+ connectionStatus_t cStatus=CONN_SUCCESS;
{
boost::lock_guard<boost::mutex> prLock(this->m_prMutex);
boost::lock_guard<boost::mutex> dcLock(this->m_dcMutex);
coordId = this->getNextCoordinationId();
OutBoundRpcMessage out_msg(exec::rpc::REQUEST, exec::user::RUN_QUERY, coordId, &query);
- sendSync(out_msg);
+ // Create the result object and register the listener before we send the query
+ // because sometimes the caller is not checking the status of the submitQuery call.
+ // This way, the broadcast error call will cause the results listener to be called
+ // with a COMM_ERROR status.
pQuery = new DrillClientQueryResult(this, coordId, plan);
pQuery->registerListener(l, lCtx);
- bool sendRequest=false;
this->m_queryIds[coordId]=pQuery;
- DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query request. " << "[" << m_connectedHost << "]" << "Coordination id = " << coordId << std::endl
- DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query " << "Coordination id = " << coordId << " query: " << plan << std::endl
+ connectionStatus_t cStatus=sendSync(out_msg);
+ if(cStatus == CONN_SUCCESS){
+ bool sendRequest=false;
- if(m_pendingRequests++==0)
{
- sendRequest=true;
- }
else
{ - DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Queueing query request to server" << std::endl;) - DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Number of pending requests = " << m_pendingRequests << std::endl;) - } - if(sendRequest){
- DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sending query request. Number of pending requests = "
- << m_pendingRequests << std::endl
- getNextResult(); // async wait for results
+ DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query request. " << "[" << m_connectedHost << "]" << "Coordination id = " << coordId << std::endl
+ DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query " << "Coordination id = " << coordId << " query: " << plan << std::endl-
- End diff –
-
Hmm. Vim's code formatter got confused by the fact that the previous line of code had no terminal semi-colon.
Fixed it manually.
Github user parthchandra commented on a diff in the pull request:
https://github.com/apache/drill/pull/493#discussion_r69666695
— Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp —
@@ -469,34 +471,50 @@ DrillClientQueryResult* DrillClientImpl::SubmitQuery(::exec::shared::QueryType t
uint64_t coordId;
DrillClientQueryResult* pQuery=NULL;
+ connectionStatus_t cStatus=CONN_SUCCESS;
{
boost::lock_guard<boost::mutex> prLock(this->m_prMutex);
boost::lock_guard<boost::mutex> dcLock(this->m_dcMutex);
coordId = this->getNextCoordinationId();
OutBoundRpcMessage out_msg(exec::rpc::REQUEST, exec::user::RUN_QUERY, coordId, &query);
- sendSync(out_msg);
+ // Create the result object and register the listener before we send the query
+ // because sometimes the caller is not checking the status of the submitQuery call.
+ // This way, the broadcast error call will cause the results listener to be called
+ // with a COMM_ERROR status.
pQuery = new DrillClientQueryResult(this, coordId, plan);
pQuery->registerListener(l, lCtx);
- bool sendRequest=false;
this->m_queryIds[coordId]=pQuery;
- DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query request. " << "[" << m_connectedHost << "]" << "Coordination id = " << coordId << std::endl
- DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query " << "Coordination id = " << coordId << " query: " << plan << std::endl
+ connectionStatus_t cStatus=sendSync(out_msg);
+ if(cStatus == CONN_SUCCESS){
+ bool sendRequest=false;
- if(m_pendingRequests++==0)
{
- sendRequest=true;
- }
else
{ - DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Queueing query request to server" << std::endl;) - DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Number of pending requests = " << m_pendingRequests << std::endl;) - } - if(sendRequest){
- DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sending query request. Number of pending requests = "
- << m_pendingRequests << std::endl
- getNextResult(); // async wait for results
+ DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query request. " << "[" << m_connectedHost << "]" << "Coordination id = " << coordId << std::endl
+ DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sent query " << "Coordination id = " << coordId << " query: " << plan << std::endl
+
+ if(m_pendingRequests++==0) { + sendRequest=true; + }else
{ + DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Queueing query request to server" << std::endl;) + DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Number of pending requests = " << m_pendingRequests << std::endl;) + }+ if(sendRequest)
{ + DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Sending query request. Number of pending requests = " + << m_pendingRequests << std::endl;) + getNextResult(); // async wait for results + }}
+
+ }
+ if(cStatus!=CONN_SUCCESS){-
- End diff –
-
Though you are correct in once respect. The only reliable way to know if a connection is gone is the heartbeat. So a heartbeat failure will eventually occur and will guarantee that any pending queries fail in case of a network failure.
GitHub user parthchandra opened a pull request:
https://github.com/apache/drill/pull/493
DRILL-4647: C++ client fails to propagate a dead connection error to ……the application.
Return correct status from the DrillClient API if query submission fails.
Also, propagate error back to the application thru the listener if query submission fails or if there is a heartbeat failure before the response to query submission is received.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/parthchandra/drill
DRILL-4647Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/493.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #493
commit e39f263eb18fb52d4105dfc8d3131357ca0a19a9
Author: Parth Chandra <parthc@apache.org>
Date: 2016-04-30T00:38:44Z
DRILL-4647: C++ client fails to propagate a dead connection error to the application