Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Challenges in current architecture:
High latency when reading data from Hive
--Several hours to fetch data when join big tables
--Route to SQL-on-Hadoop turned off due to performance issue
Time-to-Market of data latency
--Huge IO & Network traffic with MR jobs
Streaming
--Streaming process and pre-calculate cubes
Where Spark could bring benefits to Kylin:
Integrating with Spark SQL:
--Option I: Read data from SparkSQL instead of Hive
--Option II: Route unsupported queries to SparkSQL
--Option III: Kylin to be OLAP source of SparkSQL
Spark Cube Build Engine
--Efficiency cube generate engine with Spark
Spark Streaming
--Leverage SparkStreaming for StreamingOLAP
HBase?
--Any idea?