Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
v1.4.0
-
None
Description
POC result of spark cubing shows that, on a dataset of 150 million records, MR is about 100% faster than Spark, however we believe that Spark could be at least at same speed as MR, so optimization is needed here.
We are asking Spark community for help now.
the cluster info:
vm: 8 nodes * (128G mem + 64 core)
hadoop cluster: hdp 2.2.6
spark running mode: yarn-client
spark version: 1.5.1