[KAFKA-3543] Allow a variant of transform() which can emit multiple values - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 0.10.0.0
Fix Version/s: None
Component/s: streams
Labels:
- api

Description

Right now it seems that if you want to apply an arbitrary stateful transformation to a stream, you either have to use a TransformerSupplier or ProcessorSupplier sent to transform() or process(). The custom processor will allow you to emit multiple new values, but the process() method currently terminates that branch of the topology so you can't apply additional data flow. transform() lets you continue the data flow, but forces you to emit a single value for every input value.

(It actually doesn't quite force you to do this, since you can hold onto the ProcessorContext and emit multiple, but that's probably not the ideal way to do it )

It seems desirable to somehow allow a transformation that emits multiple values per input value. I'm not sure of the best way to factor this inside of the current TransformerSupplier/Transformer architecture in a way that is clean and efficient – currently I'm doing the workaround above of just calling forward() myself on the context and actually emitting dummy values which are filtered out downstream.

-------------

It is worth considering adding a new flatTransofrm function as

<K1, V1> KStream<K1, V1> transform(TransformerSupplier<K, V, Iterable<KeyValue<K1, V1>>> transformerSupplier, String... stateStoreNames)

which is essentially the same as

 transform().flatMap()

Attachments

Issue Links

blocks

FINERACT-692 G

Closed

duplicates

KAFKA-4217 KStream.transform equivalent of flatMap

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Greg Fodor

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Apr/16 22:38

Updated:: 15/Jan/19 23:25

Resolved:: 01/Feb/17 00:38