Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-662

Samza auto-creates changelog stream without sufficient partitions when container number > 1

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.9.1
    • Component/s: container
    • Labels:
      None

      Description

      We currently provide auto-create for changelog streams. However, when there are more than 1 containers, it is possible that Samza creates a changelog stream with insufficient partitions.

      Reason:
      assume we have an input stream with 3 partitions and then we assign 3 containers for this job. According to the JobCoordinator, we will get:

      Container(Model) InputStream Partition Changelog Partition
      0 0 0
      1 1 1
      2 2 2

      If Container 0 is brought up first, it calls

          val maxChangeLogStreamPartitions = containerModel.getTasks.values
                  .max(Ordering.by { task:TaskModel => task.getChangelogPartition.getPartitionId })
                  .getChangelogPartition.getPartitionId + 1
      

      in SamzaContainer.

      The maxChangeLogStreamPartition is 1. So we will auto-create a changelog stream with only 1 partitions.

      Similarly, if the Container 2 is brought up first, we will get a stream with 2 partitions.

        Attachments

        1. SAMZA-662.v1.patch
          3 kB
          Guozhang Wang
        2. SAMZA-662-0.9.1-branch.patch
          3 kB
          Jakob Homan

          Issue Links

            Activity

              People

              • Assignee:
                guozhang Guozhang Wang
                Reporter:
                closeuris Yan Fang
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: