Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
While extending our smoke e2e test to use the remote SDKS I've stumbled upon a bug in the RequestReplyFunction. We get a unknown state exception after recovery.
The exact scenario that trigger that bug is:
- There was request in flight.
- A failure occurs that causes the job to restart.
- On restore, we start with no managed state
- But we try to re-send to the SDK exactly the same ToFunction message.
- That ToFunction contains state definitions from the previous attempt. (before the failure)
- The SDK processes this message normally (it has all the state definitions that it knows)
- The SDK responds with a state mutation.
- The PersistedRemoteFunctionValues fails with unknown state.
We need to treat the ToFunction messages as a retryBatch, instead of sending it as-is.
Attachments
Issue Links
- links to