Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3146

Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.2, 1.1.0
    • 1.3.0
    • DStreams
    • None

    Description

      Currently Spark Streaming Kafka API stores the key and value of each message into BM for processing, potentially this may lose the flexibility for different requirements:

      1. currently topic/partition/offset information for each message is discarded by KafkaInputDStream. In some scenarios people may need this information to better filter the message, like SPARK-2388 described.
      2. People may need to add timestamp for each message when feeding into Spark Streaming, which can better measure the system latency.
      3. Checkpointing the partition/offsets or others...

      So here we add a messageHandler in interface to give people the flexibility to preprocess the message before storing into BM. In the meantime time this improvement keep compatible with current API.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jerryshao Saisai Shao
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: