[KAFKA-8396] Clean up Transformer API - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Abandoned
Affects Version/s: None
Fix Version/s: None
Component/s: streams
Labels:
- needs-kip
- user-experience

Description

Currently, KStream operators transformValues and flatTransformValues disable context forwarding, and force operators to just return the new values.

The reason is that we wanted to prevent the key from changing, since the whole point of a `xValues` transformation is that we do not change the key, and hence don't need to repartition.

However, the chosen mechanism has some drawbacks: The Transform concept is basically a way to plug in a custom Processor within the Streams DSL, but these restrictions make it more like a MapValues with access to the context. For example, even though you can still schedule punctuations, there's no way to forward values as a result of them. So, as a user, it's hard to build a mental model of how to use a TransformValues (because it's not quite a Transformer and not quite a Mapper).

Also, logically, a Transformer can call forward as much as it wants, so a Transformer and a FlatTransformer are effectively the same thing. Then, we also have TransformValues and FlatTransformValues that are also two more versions of the same thing, just to implement the key restrictions. Internally, some of these can send downstream by returning OR forwarding, and others can only return. It's a lot for users to keep in mind.

We can clean up this API significantly by just allowing all transformers to call `forward`. In the `Values` case, we can wrap the ProcessorContext in one that checks the key is identical (`==`) to the one that got passed in (i.e., saves a reference and enforces equality with that reference in any call to `forward`). Then, we can actually deprecate the `ValueTransformer` interfaces and remove the restriction about calling forward.

We can consider a further cleanup (TBD) to deprecate the existing Transformer interface entirely, and replace it with one with a `void` return type. Then, the Transform and FlatTransform cases collapse together, and we just need Transform and TransformValues.

Attachments

Issue Links

is related to

KAFKA-8410 Strengthen the types of Processors, at least in the DSL, maybe in the PAPI as well

Resolved

is superceded by

KAFKA-13654 Extend KStream process with new Processor API

Resolved

relates to

KAFKA-10603 Re-design KStream.process() and K*.transform*() operations

Resolved

links to

GitHub Pull Request #11553

Activity

People

Assignee:: Unassigned

Reporter:: John Roesler

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 20/May/19 15:58

Updated:: 23/Aug/24 02:35

Resolved:: 04/Mar/24 21:50