Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28415

Add messageHandler to Kafka 10 direct stream API

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: DStreams
    • Labels:
      None

      Description

      Lack of messageHandler parameter to KafkaUtils.createDirectStrem(...) in new Kafka API is what prevents us from upgrading our processes to use it, and here's why:

      1. messageHandler() allowed parsing / filtering / projecting huge JSON files at an early stage (only a small subset of JSON fields is required for a process), without this current cluster configuration doesn't keep up with the traffic.
      2. Transforming Kafka events right after a stream is created prevents from using HasOffsetRanges interface later. This means that whole message must be propagated to the end of a pipeline, which is very ineffective.

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              spektom Michael Spector
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: