[IGNITE-21881] Deal with retry send metastorage raft commands after a timeout - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0
Component/s: None
Labels:
- ignite-3

Description

As a result of the analysis and reproduction of ~~IGNITE-21142~~, it was found that the metastorage raft command can be re-sent if it does not time out, which may not be good and lead to hidden negative consequences, such as in ~~IGNITE-21142~~.

Here we need to find out the reasons for this decision (with re-try by timeout) and understand what to do next. I think we should use an infinite timeout.

As a result of the analysis and reproduction of ~~IGNITE-21142~~, it was found that the metastorage raft command can be re-sent if it does not time out, which may not be good and lead to hidden negative consequences, such as in ~~IGNITE-21142~~.

Here we need to find out the reasons for this decision (with re-try by timeout) and understand what to do next. I think we should use an infinite timeout.

Upd#1

As discussed, it's required to detect whether InvokeCommand was already processed on a server and resend original response if true instead of reprocessing. First of all it's not only about invoke but also about multiInvoke. Worth mentioning though that it relates only to MS and maybe CMG but not Partitions: within partitions, tx protocol along with returning result from indexes instead of returning result from raft, protects us from non-idempotent command processing.

All in all following solution is expected to be implemented:

New interface NonIdempotentCommand is introduced with an id field.
All MS non-idempotent commands like InvokeCommand and MultiInvokeCommand implement aforementioned interface.
On the client side, an identifier is added to the command. Two options are possible here:
- It's possible to set id to the the command on command creation. Easiest way, but it will required extra effort on the server side to track command time. In that case it's possible to use LongCounter + nodeId as an id.
- Or it's possible to adjust command with an id within retry loop, in that case we may use id as a "command time", of course, it also means that clock or System.currentTime<> should be used as id. I strongly believe that first option is better for now.
On the server side, precisely, within MS state machine new nonIdempotentCommandCache is introduced commandId -> (commandResult, commandStartTime)
On each NonIdempotentCommand following logic should be implemented:
- As an initial step it's required to check whether there's a command with given id in the cache, if true just return cached result, without command reprocessing.
- If there's no given command in the cache, process it and populate the cache with the result.

Basically that's all. Both cache persistence and recovery on group restart and cache cleanup will be covered within separate tickets.

Attachments

Issue Links

blocks

IGNITE-22214 Meta storage idempotent invokes: persist and recovery idempotent command cache

Resolved

is blocked by

IGNITE-22113 Remove unused MetaStorageManagerImpl getAnd<> methods

Resolved

is caused by

IGNITE-21142 "SqlException: Table with name 'N' already exists" upon creating a non-existing table

Resolved

links to

GitHub Pull Request #3725

Deal with retry send metastorage raft commands after a timeout

Details

Description

Upd#1

Attachments

Issue Links

Activity

People

Dates

Time Tracking