[KYLIN-3679] Fetch Kafka topic with Spark streaming - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Spark Engine
Labels:
None

Description

Now Kylin uses a MR job to fetch Kafka messages in parallel and then persist to HDFS for subsequent processing. If user selects to use Spark engine, we can use Spark streaming API to do this. Spark streaming can read the Kafka message in a given offset range as a RDD, then it would be easy to process;

https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html

With Spark streaming, Kylin can also easily connect with other data source like Kinesis, Flume, etc.

Attachments

Activity

People

Assignee:: weibin0516

Reporter:: Shao Feng Shi

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Nov/18 11:22

Updated:: 28/Feb/20 14:03