[SPARK-44835] SparkConnect ReattachExecute could raise before ExecutePlan even attaches. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.5.0
Fix Version/s: 4.0.0, 3.5.1
Component/s: Connect
Labels:
None

Epic Link:
connect-query-lifecycle

Description

If a ReattachExecute is sent very quickly after ExecutePlan, the following could happen:

ExecutePlan didn't reach executeHolder.runGrpcResponseSender(responseSender) in SparkConnectExecutePlanHandler yet.
ReattachExecute races around and reaches executeHolder.runGrpcResponseSender(responseSender) in SparkConnectReattachExecuteHandler first.
When ExecutePlan reaches executeHolder.runGrpcResponseSender(responseSender), and executionObserver.attachConsumer(this) is called in ExecuteGrpcResponseSender of ExecutePlan, it will kick out the ExecuteGrpcResponseSender or ReattachExecute.

So even though ReattachExecute came later, it will get interrupted by the earlier ExecutePlan and finish with a SparkSQLException(errorClass = "INVALID_CURSOR.DISCONNECTED", Map.empty) (which was assumed to be a situation where a stale hanging RPC is replaced by a reconnection.

That would be very unlikely to happen in practice, because ExecutePlan shouldn't be abandoned so fast, but because of https://issues.apache.org/jira/browse/SPARK-44833 it is slightly more likely (though there there is also a 50ms sleep before retry, which again make it unlikely)

Attachments

Issue Links

links to

[Github] Pull Request #42818 (juliuszsompolski)

Activity

People

Assignee:: Juliusz Sompolski

Reporter:: Juliusz Sompolski

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/Aug/23 16:41

Updated:: 07/Sep/23 01:48

Resolved:: 07/Sep/23 01:48