Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
I investigated the lock of CallFuture while reviewing TAJO-1469. CallFuture should be synchronized with run() and get(). Current code looks like this would be implemented but not. If the following situation is occur, some resources or tasks will be lost forever.
Worker: TaskRunner sends GetTask request.
QM: QueryMaster selects proper task and calls RpcCallback.
Worker: AsyncRpcClient receives the response and calls CallFuture.run(response). 3-1. Worker: If TimeoutException occurs after 1) between 2) ~ 3), TaskRunner can't receive the response and doesn't run the allocated task, but QM doesn't know about that.
We should fix this problem in the RPC module and add a right cancel logic.