Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-1695

Clear events in debounce queue on session expiration

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.15.0
    • Component/s: None
    • Labels:
      None

      Description

      Scenario:

      Let's assume there're three processors in the group [P1, P2, P3] and P1 is the leader.

      1. Leader processor(P1) loses connectivity with a zookeeper server in the ensemble and it's ephemeral processor node is deleted(due to session expiration).
      2. Immediate successor(P2) to the leader(P1) finds out that the leader is dead and declares itself as leader. Processor P2 Schedules onProcessorChange to publish JobModel.
      3. ZkClient connection retry logic helps the Leader(P1) to reconnect to another zkServer in the ensemble and it joins as follower.
      4. Processor P1 acts on the stale buffered event in the debounce queue(which it received when it's a leader) and acts as leader. At this point, there're two processors acting as leader(P1 & P2). If P1 proceeds to execute leader actions before P2, P2 will fail(and in worst case can cause state corruption).

      Sample exception logs:

      https://gist.github.com/shanthoosh/55410fe4ebf3cfb65281b35f16397cad

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                spvenkat Shanthoosh Venkataraman
                Reporter:
                spvenkat Shanthoosh Venkataraman
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: