Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-5632

Message headers not supported by Kafka Streams

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • 0.11.0.0
    • None
    • consumer

    Description

      The new message headers functionality introduced in Kafka 0.11.0.0 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers) does not appear to be respected by Kafka Streams, specifically message headers set on input topics to a Kafka Streams topology do not get propagated to the corresponding output topics of the topology.

      It appears that it's at least partially due to the SourceNodeRecordDeserializer not properly respecting message headers here:

      https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/SourceNodeRecordDeserializer.java#L60

      where it isn't using the new ConsumerRecord constructor which supports headers:

      https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecord.java#L122

      For additional background here is the line before which we noticed that we still have the message headers, and after which we no longer have them:

      https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordQueue.java#L93

      In terms of a potential solution there are a few different scenarios to consider:
      1. A stream processor with one input and one output, i.e. 1-to-1, (A map/transformation for example). This is the simplest case, and one proposal would be to directly propagate any message headers from input to output.
      2. A stream processor with one input and many outputs, i.e. 1-to-many, (A flatmap step for example).
      3. A stream processor with multiple inputs per output, i.e. many-to-1, (A join step for example).
      One proposal for supporting all possible scenarios would be to expose overloads in the Kafka Streams DSL methods to allow the user the ability to specify logic for handling of message headers.

      For additional background the use case is similar to a distributed tracing use case, where the following previous work may be useful for aiding in design discussions:
      Dapper (https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36356.pdf)
      or
      Zipkin (https://github.com/openzipkin/zipkin)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              cjwoolard CJ Woolard
              Votes:
              8 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: