Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-6700

Node considered as failed can cause failure of others nodes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • general
    • None

    Description

      Node considered as failed can cause failure of others nodes in cluster.

      There is an issue in TcpDiscoveryAbstractMessage.failedNodes processing, if message is received from node considered as failed, then failedNodes should be ignored.

      Possible scenario:

      • there are 4 nodes (1 -> 2 -> 3 -> 4)
      • node 3 temporary lost connection with others
      • node 2 considers 3 as failed, node failed event is fired for 3
      • node 3 considers 4 as failed, adds 4 in nodeFailedList, then it restores connection with 1 and currently 1 will process nodeFailedList from 3 (even if 3 is already considered as failed)

      Attachments

        Issue Links

          Activity

            People

              ein Alexandr Kuramshin
              sboikov Semen Boikov
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: