Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-1692

Standalone stability fixes.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.15.0, 0.15
    • None
    • None

    Description

      • Currently on session expiration, processorListener with incorrect generationId is registered with zookeeper(ZkUtils generationId is incremented on reconnect but the generationId in processorListener is zero all the time). When this happens to immediate successor to leader, leader expiration event will be skipped by that processor. This will prevent leader re-election on a current leader death and will stall the processors group. Fix is to re-instantiate and then register processorChangeListener on session expiration.
      • Add processorId to debounce thread name (this can aid debugging when multiple processors are running within a jvm).
      • After ScheduleAfterDebounceTime queue is shutdown, don't accept new schedule requests. Current ZkJobCoordinator shutdown sequence comprise of the following steps:
        • Shutdown the ScheduleAfterDebounceTime queue.
        • Stop the zkClient and relinquish it's resources.

      After we shutdown ScheduleAfterDebounceTime and before zkclient is stopped, any new operations can be scheduled in ScheduleAfterDebounceTime queue by zkClient. This will result in RejectedExecutionException, since executorService is stopped.

      sample exception:

      Caused by: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@23f962a8 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@43408be8
      

      Attachments

        Activity

          People

            spvenkat Shanthoosh Venkataraman
            spvenkat Shanthoosh Venkataraman
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: