[SPARK-44833] Spark Connect reattach when initial ExecutePlan didn't reach server doing too eager Reattach - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.5.0
Fix Version/s: 4.0.0, 3.5.1
Component/s: Connect
Labels:
None

Epic Link:
connect-query-lifecycle

Description

case ex: StatusRuntimeException
    if Option(StatusProto.fromThrowable(ex))
      .exists(_.getMessage.contains("INVALID_HANDLE.OPERATION_NOT_FOUND")) =>
  if (lastReturnedResponseId.isDefined) {
    throw new IllegalStateException(
      "OPERATION_NOT_FOUND on the server but responses were already received from it.",
      ex)
  }
  // Try a new ExecutePlan, and throw upstream for retry.
->  iter = rawBlockingStub.executePlan(initialRequest)
->  throw new GrpcRetryHandler.RetryException

we call executePlan, and throw RetryException to have an exception handled upstream.

Then it goes to

retry {
  if (firstTry) {
    // on first try, we use the existing iter.
    firstTry = false
  } else {
    // on retry, the iter is borked, so we need a new one
->    iter = rawBlockingStub.reattachExecute(createReattachExecuteRequest())
  }

and because it's not firstTry, immediately does reattach.

This causes no failure - the reattach will work and attach to the query, the original executePlan will get detached. But it could be improved.

Same issue is also present in python reattach.py.

Attachments

Issue Links

links to

[Github] Pull Request #42806 (juliuszsompolski)

Activity

People

Assignee:: Juliusz Sompolski

Reporter:: Juliusz Sompolski

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/Aug/23 16:34

Updated:: 06/Sep/23 05:26

Resolved:: 06/Sep/23 05:22