Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-11454

Race in ClientImpl may lead to client node segmentation on fast reconnect

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8
    • None
    • None

    Description

      We have the following code in ClientImpl#tryJoin:

                  if (spi.joinTimeout > 0) {
                      final int joinCnt0 = joinCnt;
      
                      timer.schedule(new TimerTask() {
                          @Override public void run() {
                              if (joinCnt == joinCnt0 && joining())
                                  queue.add(JOIN_TIMEOUT);
                          }
                      }, spi.joinTimeout);
                  }
      

      We have a window when the timeout object is still scheduled, but the node is already connected to the cluster. The following sequence is possible: a node disconnects, clears it's queue, then timeout object is fired, adds a message to the queue, then tryJoin is called. In this case, the node will be immediately segmented.
      ClientReconnectAfterClusterRestartTest demonstrates this if join timeout is set to 10s.

      Attachments

        Issue Links

          Activity

            People

              agoncharuk Alexey Goncharuk
              agoncharuk Alexey Goncharuk
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m