Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-226

Auto-create changelog streams for kv



    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.9.0
    • Component/s: container, kv
    • Labels:


      Currently, changelog topics are not auto-created. This is a frustrating user experience, and there are a few useful defaults that should be set that are not obvious when creating Kafka topics with log compaction enabled.

      We should have Samza auto-create changelog streams for the kv stores that have changelogs enabled.

      In Kafka's case, the changelog topics should be created with compaction enabled. They should also be created with a smaller (100mb) default segment.bytes setting. The smaller segment.bytes setting is useful for low-volume changelogs. The problem we've seen in the past is that the default log.segment.bytes is 1 gig. Kafka's compaction implementation NEVER touches the most recent log segment. This means that, if you have a very small state store, but execute a lot of deletes/updates (e.g. you've only got maybe 25 megs of active state, but are deleting and updating it frequently), you will always end up with at LEAST 1 gig of state to restore (since the most recent segment will always contain non-compacted writes). This is silly since your active (compacted) state is really only ~25 megs. Shrinking the segment bytes means that you'll have a smaller maximum data size to restore. The trade off here is that we'll have more segment files for changelogs, which will increase file handles.

      The trick is doing this in a generic way, since we are supporting changelogs for more than just Kafka systems. I think the interface to do the stream creation belongs in the SystemAdmin interface. It would be nice to have a generic SystemAdmin.createStream() interface, but this would require giving it kafka-specific configuration. Another option is to have SystemAdmin.createChangelogStream, but this seems a bit hacky at first glance. We need to think this part through.

      Martin Kleppmann, in hello-samza, how are we creating log compacted state stores with the appropriate number of partitions? Is this handled as part of bin/grid?


        1. rb28016.patch
          31 kB
          Naveen Somasundaram
        2. rb28016 (1).patch
          33 kB
          Naveen Somasundaram
        3. rb28016 (2).patch
          33 kB
          Naveen Somasundaram
        4. rb29012.patch
          40 kB
          Naveen Somasundaram

          Issue Links



              • Assignee:
                naveenatceg Naveen Somasundaram
                criccomini Chris Riccomini
              • Votes:
                1 Vote for this issue
                7 Start watching this issue


                • Created: