Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-21881

Deal with retry send metastorage raft commands after a timeout

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0
    • None

    Description

      As a result of the analysis and reproduction of IGNITE-21142, it was found that the metastorage raft command can be re-sent if it does not time out, which may not be good and lead to hidden negative consequences, such as in IGNITE-21142.

      Here we need to find out the reasons for this decision (with re-try by timeout) and understand what to do next. I think we should use an infinite timeout.

      As a result of the analysis and reproduction of IGNITE-21142, it was found that the metastorage raft command can be re-sent if it does not time out, which may not be good and lead to hidden negative consequences, such as in IGNITE-21142.

      Here we need to find out the reasons for this decision (with re-try by timeout) and understand what to do next. I think we should use an infinite timeout.

      Upd#1

      As discussed, it's required to detect whether InvokeCommand was already processed on a server and resend original response if true instead of reprocessing. First of all it's not only about invoke but also about multiInvoke. Worth mentioning though that it relates only to MS and maybe CMG but not Partitions: within partitions, tx protocol along with returning result from indexes instead of returning result from raft, protects us from non-idempotent command processing.

      All in all following solution is expected to be implemented:

      • New interface NonIdempotentCommand is introduced with an id field.
      • All MS non-idempotent commands like InvokeCommand and MultiInvokeCommand implement aforementioned interface.
      • On the client side, an identifier is added to the command. Two options are possible here:
        • It's possible to set id to the the command on command creation. Easiest way, but it will required extra effort on the server side to track command time. In that case it's possible to use LongCounter + nodeId as an id. 
        • Or it's possible to adjust command with an id within retry loop, in that case we may use id as a "command time", of course, it also means that clock or System.currentTime<> should be used as id. I strongly believe that first option is better for now. 
      • On the server side, precisely, within MS state machine new nonIdempotentCommandCache is introduced commandId -> (commandResult, commandStartTime)
      • On each NonIdempotentCommand following logic should be implemented:
        • As an initial step it's required to check whether there's a command with given id in the cache, if true just return cached result, without command reprocessing.
        • If there's no given command in the cache, process it and populate the cache with the result.

      Basically that's all. Both cache persistence and recovery on group restart and cache cleanup will be covered within separate tickets.

      Attachments

        Issue Links

          Activity

            People

              Denis Chudov Denis Chudov
              ktkalenko@gridgain.com Kirill Tkalenko
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 10m
                  4h 10m