Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-13012

Fix failure detection timeout. Simplify node ping routine.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.8.1
    • 2.9
    • None
    • Fixed processing of failure detection timeout in TcpDiscoverySpi. If a node fails to send a message or ping, now it drops current connection strictly within this timeout and begins establishing new connection much faster.
    • Release Notes Required

    Description

      Connection failure may not be detected within IgniteConfiguration.failureDetectionTimeout. Actual worst delay is: ServerImpl.CON_CHECK_INTERVAL + IgniteConfiguration.failureDetectionTimeout. Node ping routine is duplicated.

      We should fix:

      1. Failure detection timeout should take in account last sent message. Current ping is bound to own time:

      ServerImpl. RingMessageWorker.lastTimeConnCheckMsgSent

      This is weird because any discovery message check connection.

      2. Make connection check interval depend on failure detection timeout (FTD). Current value is a constant:

      static int ServerImpls.CON_CHECK_INTERVAL = 500

      3. Remove additional, quickened connection checking. Once we do fix 1, this will become even more useless.
      Despite TCP discovery has a period of connection checking, it may send ping before this period exhausts. This premature ping relies also on the time of any received message for some reason.

      4. Do not worry user with “Node seems disconnected” when everything is OK. Once we do fix 1 and 3, this will become even more useless.
      Node may log on INFO: “Local node seems to be disconnected from topology …” whereas it is not actually disconnected at all.

      Attachments

        1. IGNITE-13012-patch.patch
          18 kB
          Vladimir Steshin

        Issue Links

          Activity

            People

              vladsz83 Vladimir Steshin
              vladsz83 Vladimir Steshin
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 40m
                  3h 40m