Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-226

Auto-create changelog streams for kv



    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.9.0
    • 0.9.0
    • container, kv
    • None


      Currently, changelog topics are not auto-created. This is a frustrating user experience, and there are a few useful defaults that should be set that are not obvious when creating Kafka topics with log compaction enabled.

      We should have Samza auto-create changelog streams for the kv stores that have changelogs enabled.

      In Kafka's case, the changelog topics should be created with compaction enabled. They should also be created with a smaller (100mb) default segment.bytes setting. The smaller segment.bytes setting is useful for low-volume changelogs. The problem we've seen in the past is that the default log.segment.bytes is 1 gig. Kafka's compaction implementation NEVER touches the most recent log segment. This means that, if you have a very small state store, but execute a lot of deletes/updates (e.g. you've only got maybe 25 megs of active state, but are deleting and updating it frequently), you will always end up with at LEAST 1 gig of state to restore (since the most recent segment will always contain non-compacted writes). This is silly since your active (compacted) state is really only ~25 megs. Shrinking the segment bytes means that you'll have a smaller maximum data size to restore. The trade off here is that we'll have more segment files for changelogs, which will increase file handles.

      The trick is doing this in a generic way, since we are supporting changelogs for more than just Kafka systems. I think the interface to do the stream creation belongs in the SystemAdmin interface. It would be nice to have a generic SystemAdmin.createStream() interface, but this would require giving it kafka-specific configuration. Another option is to have SystemAdmin.createChangelogStream, but this seems a bit hacky at first glance. We need to think this part through.

      martinkl, in hello-samza, how are we creating log compacted state stores with the appropriate number of partitions? Is this handled as part of bin/grid?


        1. rb28016.patch
          31 kB
          Naveen Somasundaram
        2. rb28016 (1).patch
          33 kB
          Naveen Somasundaram
        3. rb28016 (2).patch
          33 kB
          Naveen Somasundaram
        4. rb29012.patch
          40 kB
          Naveen Somasundaram

        Issue Links



              naveenatceg Naveen Somasundaram
              criccomini Chris Riccomini
              1 Vote for this issue
              7 Start watching this issue