Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-6607

Possible client subscription data inconsistency due to race between retrieving filter info and distributing event

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.10.0
    • client queues
    • None

    Description

      It is possible for a client to miss events from subscription (either CQ or register interest) due to the following scenario:

      Four servers in a cluster, with redundant copies set to 2 for client subscriptions.  The client has its primary subscription endpoint with server 1 and redundant copies are on servers 2 and 3.  Server 2 is killed or lost due to network partition, so we attempt to restore redundancy by copying the client queue from server 3 to server 4.  

      Two things happen when server 4 gets the client queue from server 3.  First, we request the client's filter info which represents the CQ and register interest info.  Second, we actually perform the GII to get the image of the queue.  

      A race can occur where an event is being distributed across the cluster concurrently while server 4 is initializing the client queue.  If the distributed event is processed by server 4 before the filter info is retrieved, then the event will not match the client subscription filter because it doesn't exist yet.  Then, if the event is processed by server 3 after GII has started, the event will not be part of the client queue image.  Therefore, the event is never added to the client queue and is lost.

      We have a special queue for handling events while a client is initializing, but it is at too low of a level (MessageDispatcher) to be able to handle this scenario.  One possible solution is moving this special queue to a higher level (CacheClientNotifier or CacheClientProxy) so the event is queued before we even attempt to get filter info.  Then, when initialization finishes, we drain the queue, see if it matches the initialized client's filter, and send it along if so.  A similar solution could be done on the GII provider side but it might be a bit messier.

       

      Attachments

        Activity

          People

            rmcmahon Ryan McMahon
            rmcmahon Ryan McMahon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 40m
                2h 40m