Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-8396

Clean up Transformer API

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: streams

      Description

      Currently, KStream operators transformValues and flatTransformValues disable context forwarding, and force operators to just return the new values.

      The reason is that we wanted to prevent the key from changing, since the whole point of a `xValues` transformation is that we do not change the key, and hence don't need to repartition.

      However, the chosen mechanism has some drawbacks: The Transform concept is basically a way to plug in a custom Processor within the Streams DSL, but these restrictions make it more like a MapValues with access to the context. For example, even though you can still schedule punctuations, there's no way to forward values as a result of them. So, as a user, it's hard to build a mental model of how to use a TransformValues (because it's not quite a Transformer and not quite a Mapper).

      Also, logically, a Transformer can call forward as much as it wants, so a Transformer and a FlatTransformer are effectively the same thing. Then, we also have TransformValues and FlatTransformValues that are also two more versions of the same thing, just to implement the key restrictions. Internally, some of these can send downstream by returning OR forwarding, and others can only return. It's a lot for users to keep in mind.

      We can clean up this API significantly by just allowing all transformers to call `forward`. In the `Values` case, we can wrap the ProcessorContext in one that checks the key is `equal` to the one that got passed in (i.e., saves a reference and enforces equality with that reference in any call to `forward`). Then, we can actually deprecate the `ValueTransformer` interfaces and remove the restriction about calling forward.

      We can consider a further cleanup (TBD) to deprecate the existing Transformer interface entirely, and replace it with one with a `void` return type. Then, the Transform and FlatTransform cases collapse together, and we just need Transform and TransformValues.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                vvcephei John Roesler
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: