Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44835

SparkConnect ReattachExecute could raise before ExecutePlan even attaches.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.5.0
    • 4.0.0, 3.5.1
    • Connect
    • None

    Description

      If a ReattachExecute is sent very quickly after ExecutePlan, the following could happen:

      • ExecutePlan didn't reach executeHolder.runGrpcResponseSender(responseSender) in SparkConnectExecutePlanHandler yet.
      • ReattachExecute races around and reaches executeHolder.runGrpcResponseSender(responseSender) in SparkConnectReattachExecuteHandler first.
      • When ExecutePlan reaches executeHolder.runGrpcResponseSender(responseSender), and executionObserver.attachConsumer(this) is called in ExecuteGrpcResponseSender of ExecutePlan, it will kick out the ExecuteGrpcResponseSender or ReattachExecute.

      So even though ReattachExecute came later, it will get interrupted by the earlier ExecutePlan and finish with a SparkSQLException(errorClass = "INVALID_CURSOR.DISCONNECTED", Map.empty) (which was assumed to be a situation where a stale hanging RPC is replaced by a reconnection.

       

      That would be very unlikely to happen in practice, because ExecutePlan shouldn't be abandoned so fast, but because of  https://issues.apache.org/jira/browse/SPARK-44833 it is slightly more likely (though there there is also a 50ms sleep before retry, which again make it unlikely)

      Attachments

        Activity

          People

            juliuszsompolski Juliusz Sompolski
            juliuszsompolski Juliusz Sompolski
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: