Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-1561

JobModel upgrade consistency problem.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      JobModel upgrade sequence is the following: 

      A. Read previousJobModelVersion from JobModelBasePath/jobModelVersion.

      B. Publish the new JobModel with version (previousJobModelVersion + 1) to JobModelBasePath/jobmodels.

      C. Create a barrier with version (previousJobModelVersion + 1).

      D. Update jobModelVersion path with value (previousJobModelVersion + 1).

      Followers watch on jobModelVersion path for JobModel upgrades.

      If the leader dies before executing the last step of the upgrade sequence, then any processor elected as leader will be unable to publish the new JobModel and will fail with ZkNodeExistsException (For instance, previousJobModel version is 2 of a processor group [P1, P2]. P1 is the leader and it created zkNode jobModelBasePath/jobModels/3 for publishing jobModel and dies without upgrading jobModelVersion path(which stays as 2). If P2 becomes leader, it will generate the jobModel version and try to create node jobModelBasePath/jobModels/3 and will fail).

      This behavior was observed during the testing in one of samza standalone jobs. 

      JobModelBasePath/jobModels is the source of truth for the latest jobModelVersion in a processor group. By maintaining it in a separate zookeeper node and not having the capability to do atomic upgrades we run into this consistency problem.

      Attachments

        Issue Links

          Activity

            People

              spvenkat Shanthoosh Venkataraman
              spvenkat Shanthoosh Venkataraman
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: