Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-3679

Fetch Kafka topic with Spark streaming

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Spark Engine
    • None

    Description

      Now Kylin uses a MR job to fetch Kafka messages in parallel and then persist to HDFS for subsequent processing. If user selects to use Spark engine, we can use Spark streaming API to do this. Spark streaming can read the Kafka message in a given offset range as a RDD, then it would be easy to process;

      https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html 

      With Spark streaming, Kylin can also easily connect with other data source like Kinesis, Flume, etc.

      Attachments

        Activity

          People

            codingforfun weibin0516
            shaofengshi Shao Feng Shi
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: