Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16063

Make the ZK ConnectionManager events actually single-threaded



    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • SolrJ
    • None


      The implementation of a single-threaded executor of ConnectionManger events in the SolrZkClient is fundamentally broken.

      This is because the ClientCnxn class in Zookeeper (or SolrZookeeper in our case) also tries to do the same thing, use a single-threaded executor for these events. I believe the original intent was for all watch events to be routed to the zkConnManagerCallbackExecutor in SolrZkClient. However due to the way that the ProcessWatchWithExecutor wrapper class sends these watch events to the executor, we get the opposite functionality: The SolrZkClient executor quickly "processes" these events, which merely get added to the ClientCnxn thread event queue, which actually handles all of these watch events.

      There are 2 major problem here.

      1. We cannot close the ClientCnxn eventThread, whenever ClientCnxn is closed by SolrZookeeper (parent class of Zookeeper), it merely sends a kill event to the eventThread queue, and wait for the queue to reach the end. I will detail this later, but the event processing we do leads to continually looping event threads, so the kill event is never reached.
      2. There is not 1 ClientCnxn eventThread, because there is not 1 SolrZookeeper. Whenever we need to re-establish a connection, such as in the case of a sessionExpired event, a new SolrZookeeper is created and the old one is closed. Therefore we have multiple eventThreads being used if the old SolrZookeeper cannot close its eventThread.

      Given the above two points, we now have multiple eventThreads that we do not control, and cannot close.

      The solution is to fix the way that the ConnectionManager sends eventProcessing requests to the SolrZkClient executor. The reason why ProcessWatchWithExecutor doesn't work is that it isn't available in the ConnectionManager, when the ConnectionManger is creating a new SolrZookeeper. The SolrZookeeper requires a default watcher to send a connection events to, and since the ConnectionManger can't pass itself wrapped in the SolrZkClient's ProcessWatchWithExecutor, all of those requests are skipping the SolrZkClient executor. If instead of wrapping the ConnectionManager in ProcessWatchWithExecutor, we just pass an optional executor to ConnectionManager, then it will always schedule events in the executor, even if it isn't wrapped. This way the eventThread in ClientCnxn will be very quick to process, since it will merely call ConnectionManager.process() which will instantly schedule the processing to run in the SolrZkClient executor. Therefore the ClientCnxn event thread will instantly close when requested, since there will never be a backlog of processing events.

      This solution also guarantees that every ConnectionManager event goes through the SolrZkClient's single-threaded executor, meaning that we are truly achieving our initial goal.

      There are two tickets that cause this approach to fail:

      • SOLR-4899: This makes the event processing thread wait until the zk client is connected. This makes sense, because the ConnectionManager is waiting for a Connected event to be processed, which can't be because the single-threaded executor is stalling.
      • SOLR-8599: This also pauses the single-thread executor and makes it so that no other events can be processed.

      I think this is the approach to take if we want to actually take this single-threaded approach, but we need to make sure undoing the work in the two issues mentioned above does not make Solr less stable.


        Issue Links



              Unassigned Unassigned
              houston Houston Putman
              0 Vote for this issue
              1 Start watching this issue