Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-10898

Exchange coordinator failover breaks in some cases when node filter is used

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.8
    • None
    • None

    Description

      Currently if a node does not pass cache node filter, we do not store this cache affinity on the node unless the node is coordinator. This, however, may fail in the following scenario:
      1) A node passing node filter joins cluster
      2) During the join coordinator fails, new coordinator is selected for which previous exchange is completed
      3) Next coordinator attempts to fetch the affinity, and joining node resends partitions single message, but there are two problems here. First, exchange fast-reply does not wait for the new affinity initialization which results in IllegalStateException. Second, such an attempt to fetch affinity may lead either to deadlock or to incorrectly fetched affinity (basically, coordinator must be in consensus with other nodes passing node filter)

      Test attached reproduces the issue.

      I suggest to always calculate and keep affinity on all nodes, even ones not passing the filter. In this case, there will be no need to fetch and recalculate affinity (initCoordinatorCaches will go away.

      Attachments

        1. NodeWithFilterRestartTest.java
          6 kB
          Alexey Goncharuk

        Issue Links

          Activity

            People

              DmitriyGovorukhin Dmitriy Govorukhin
              agoncharuk Alexey Goncharuk
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m