Details

      Description

      Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend.

      Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop.

      Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does.

      This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated!

      1. Hive-on-Spark.pdf
        290 kB
        Xuefu Zhang

        Issue Links

        1. Refactoring: make Hive reduce side data processing reusable [Spark Branch] Sub-task Reopened Xuefu Zhang
         
        2. StarterProject: Move configuration from SparkClient to HiveConf [Spark Branch] Sub-task Open Unassigned
         
        3. Research Hive dependency on MR distributed cache[Spark Branch] Sub-task Open Unassigned
         
        4. UT: add TestSparkMinimrCliDriver to run UTs that use HDFS Sub-task Open Thomas Friedrich
         
        5. UT: fix bucket_num_reducers test Sub-task Open Chinna Rao Lalam
         
        6. UTs: create missing output files for some tests under clientpositive/spark Sub-task Open Thomas Friedrich
         
        7. UT: add test flag in hive-site.xml for spark tests Sub-task Open Thomas Friedrich
         
        8. UT: fix udf_context_aware Sub-task Open Aihua Xu
         
        9. UT: fix hook_context_cs test case Sub-task Open Unassigned
         
        10. Hive/Spark/Yarn integration [Spark Branch] Sub-task Open Chengxiang Li
         
        11. Downgrade guava version to be consistent with Hive and the rest of Hadoop [Spark Branch] Sub-task Open Unassigned
         
        12. Clean up temp files of RSC [Spark Branch] Sub-task Open Unassigned
         
        13. Choosing right preference between map join and bucket map join [Spark Branch] Sub-task Open Unassigned
         
        14. Error when cleaning up in spark.log [Spark Branch] Sub-task Open Unassigned
         
        15. Support backup task for join related optimization [Spark Branch] Sub-task Patch Available Chao Sun
         
        16. Clean up GenSparkProcContext.clonedReduceSinks and related code [Spark Branch] Sub-task Patch Available Chao Sun
         
        17. UT: set hive.support.concurrency to true for spark UTs Sub-task Open Bing Li
         
        18. Cleanup code for getting spark job progress and metrics Sub-task Open Rui Li
         
        19. Improve replication factor of small table file given big table partitions [Spark branch] Sub-task Open Jimmy Xiang
         
        20. thrift.transport.TTransportException [Spark Branch] Sub-task Open Chao Sun
         
        21. Hive reported exception because that hive's derby version conflict with spark's derby version [Spark Branch] Sub-task Patch Available Pierre Yin
         
        22. Enable infer_bucket_sort_dyn_part.q for TestMiniSparkOnYarnCliDriver test. [Spark Branch] Sub-task Open Unassigned
         
        23. SparkSessionImpl calcualte wrong cores number in TestSparkCliDriver [Spark Branch] Sub-task Open Unassigned
         
        24. Print yarn application id to console [Spark Branch] Sub-task Open Chinna Rao Lalam
         
        25. Querying parquet tables fails with IllegalStateException [Spark Branch] Sub-task Open Unassigned
         
        26. HiveInputFormat implementations getsplits may lead to memory leak.[Spark Branch] Sub-task Open Unassigned
         
        27. Log the information of cached RDD [Spark Branch] Sub-task Patch Available Chinna Rao Lalam
         
        28. Provide more informative stage description in Spark Web UI [Spark Branch] Sub-task Open Unassigned
         
        29. Improve common join performance [Spark Branch] Sub-task Patch Available Unassigned
         
        30. Implement Hybrid Hybrid Grace Hash Join for Spark Branch [Spark Branch] Sub-task Open Unassigned
         
        31. Fix test failures after last merge from trunk [Spark Branch] Sub-task Open Unassigned
         
        32. Followup for HIVE-10550, check performance w.r.t. persistence level [Spark Branch] Sub-task Open GaoLun
         
        33. Make HIVE-10001 work with Spark [Spark Branch] Sub-task Open Unassigned
         
        34. Investigate intermitten failure of join28.q for Spark Sub-task Open Mohit Sabharwal
         
        35. Support hive.explain.user for Spark [Spark Branch] Sub-task Open Unassigned
         
        36. Enable native vectorized map join for spark [Spark Branch] Sub-task Open Rui Li
         
        37. Research on recent failed qtests[Spark Branch] Sub-task Open Chengxiang Li
         
        38. Combine equavilent leaf works in SparkWork[Spark Branch] Sub-task Open Chengxiang Li
         

          Activity

          Xuefu Zhang created issue -
          Xuefu Zhang made changes -
          Field Original Value New Value
          Attachment Hive-on-Spark.pdf [ 12652517 ]
          Xuefu Zhang made changes -
          Remote Link This issue links to "https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark (Web Link)" [ 15580 ]
          Jeff Hammerbacher made changes -
          Description Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend.

          Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop.

          Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does.

          This is an umber JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated!
          Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend.

          Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop.

          Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does.

          This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated!
          Brock Noland made changes -
          Comment [ I am in OOO, so, the replying to the email might get delayed. Please reach out to me at (408) 799-8605 if you need something urgent.
          Regards
          Niraj

          ]
          Brock Noland made changes -
          Comment [ I am in OOO, so, the replying to the email might get delayed.
          ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-7370 [ HIVE-7370 ]
          Xuefu Zhang made changes -
          Component/s Spark [ 12323200 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-7371 [ HIVE-7371 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-7372 [ HIVE-7372 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7387 [ HIVE-7387 ]
          Xuefu Zhang made changes -
          Link This issue requires HIVE-7391 [ HIVE-7391 ]
          Rui Li made changes -
          Link This issue incorporates HIVE-7431 [ HIVE-7431 ]
          Xuefu Zhang made changes -
          Link This issue contains HIVE-7437 [ HIVE-7437 ]
          Xuefu Zhang made changes -
          Link This issue depends upon SPARK-2421 [ SPARK-2421 ]
          Xuefu Zhang made changes -
          Link This issue depends upon SPARK-2420 [ SPARK-2420 ]
          Rui Li made changes -
          Link This issue incorporates HIVE-7467 [ HIVE-7467 ]
          Chengxiang Li made changes -
          Link This issue depends upon SPARK-2633 [ SPARK-2633 ]
          Chengxiang Li made changes -
          Link This issue depends upon SPARK-2636 [ SPARK-2636 ]
          Xuefu Zhang made changes -
          Link This issue depends upon HIVE-7489 [ HIVE-7489 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-7489 [ HIVE-7489 ]
          Xuefu Zhang made changes -
          Link This issue depends upon HIVE-7489 [ HIVE-7489 ]
          Xuefu Zhang made changes -
          Link This issue requires SPARK-2688 [ SPARK-2688 ]
          Szehon Ho made changes -
          Remote Link This issue links to "Wiki Page (cwiki)" [ 16327 ]
          Szehon Ho made changes -
          Remote Link This issue links to "Wiki Page (cwiki)" [ 16327 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-7516 [ HIVE-7516 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-7525 [ HIVE-7525 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-7526 [ HIVE-7526 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-7530 [ HIVE-7530 ]
          Rui Li made changes -
          Link This issue incorporates HIVE-7540 [ HIVE-7540 ]
          Brock Noland made changes -
          Link This issue is related to SPARK-2741 [ SPARK-2741 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7551 [ HIVE-7551 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7552 [ HIVE-7552 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-7556 [ HIVE-7556 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-7564 [ HIVE-7564 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7567 [ HIVE-7567 ]
          Brock Noland made changes -
          Link This issue is blocked by SPARK-2243 [ SPARK-2243 ]
          Na Yang made changes -
          Attachment HIVE-7584.1.patch [ 12659694 ]
          Na Yang made changes -
          Attachment HIVE-7584.1.patch [ 12659694 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7613 [ HIVE-7613 ]
          Chengxiang Li made changes -
          Link This issue contains HIVE-7613 [ HIVE-7613 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7626 [ HIVE-7626 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7627 [ HIVE-7627 ]
          Chengxiang Li made changes -
          Link This issue depends upon SPARK-2895 [ SPARK-2895 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7642 [ HIVE-7642 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7643 [ HIVE-7643 ]
          Rui Li made changes -
          Link This issue incorporates HIVE-7659 [ HIVE-7659 ]
          Brock Noland made changes -
          Link This issue relates to HIVE-7607 [ HIVE-7607 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7707 [ HIVE-7707 ]
          Na Yang made changes -
          Link This issue contains HIVE-7717 [ HIVE-7717 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7726 [ HIVE-7726 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7727 [ HIVE-7727 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7728 [ HIVE-7728 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7729 [ HIVE-7729 ]
          Rui Li made changes -
          Link This issue incorporates HIVE-7731 [ HIVE-7731 ]
          Na Yang made changes -
          Link This issue contains HIVE-7745 [ HIVE-7745 ]
          Venki Korukanti made changes -
          Link This issue incorporates HIVE-7746 [ HIVE-7746 ]
          Venki Korukanti made changes -
          Link This issue incorporates HIVE-7747 [ HIVE-7747 ]
          Rui Li made changes -
          Link This issue incorporates HIVE-7624 [ HIVE-7624 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7761 [ HIVE-7761 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7763 [ HIVE-7763 ]
          Na Yang made changes -
          Link This issue contains HIVE-7767 [ HIVE-7767 ]
          Brock Noland made changes -
          Link This issue contains HIVE-7613 [ HIVE-7613 ]
          Rui Li made changes -
          Link This issue incorporates HIVE-7772 [ HIVE-7772 ]
          Rui Li made changes -
          Link This issue relates to HIVE-7773 [ HIVE-7773 ]
          Rui Li made changes -
          Link This issue relates to HIVE-7773 [ HIVE-7773 ]
          Rui Li made changes -
          Link This issue incorporates HIVE-7773 [ HIVE-7773 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7775 [ HIVE-7775 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7776 [ HIVE-7776 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7780 [ HIVE-7780 ]
          Brock Noland made changes -
          Labels Spark-M1 Spark-M2 Spark-M3 Spark-M4 Spark-M5
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7795 [ HIVE-7795 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7799 [ HIVE-7799 ]
          Na Yang made changes -
          Link This issue incorporates HIVE-7810 [ HIVE-7810 ]
          Na Yang made changes -
          Link This issue incorporates HIVE-7870 [ HIVE-7870 ]
          Rui Li made changes -
          Link This issue incorporates HIVE-7893 [ HIVE-7893 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-7909 [ HIVE-7909 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-7916 [ HIVE-7916 ]
          Chao Sun made changes -
          Link This issue contains HIVE-7939 [ HIVE-7939 ]
          Rui Li made changes -
          Link This issue incorporates HIVE-7956 [ HIVE-7956 ]
          Xuefu Zhang made changes -
          Link This issue requires HIVE-7958 [ HIVE-7958 ]
          Chao Sun made changes -
          Link This issue incorporates HIVE-8024 [ HIVE-8024 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-8029 [ HIVE-8029 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-8043 [ HIVE-8043 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-8055 [ HIVE-8055 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-8054 [ HIVE-8054 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-8073 [ HIVE-8073 ]
          Rui Li made changes -
          Link This issue incorporates HIVE-8098 [ HIVE-8098 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-8118 [ HIVE-8118 ]
          Chao Sun made changes -
          Link This issue incorporates HIVE-8207 [ HIVE-8207 ]
          Chao Sun made changes -
          Link This issue incorporates HIVE-8207 [ HIVE-8207 ]
          Chao Sun made changes -
          Link This issue contains HIVE-8207 [ HIVE-8207 ]
          Chao Sun made changes -
          Link This issue contains HIVE-8208 [ HIVE-8208 ]
          Chao Sun made changes -
          Link This issue contains HIVE-8209 [ HIVE-8209 ]
          Chao Sun made changes -
          Link This issue contains HIVE-8215 [ HIVE-8215 ]
          Chao Sun made changes -
          Link This issue contains HIVE-8216 [ HIVE-8216 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-8219 [ HIVE-8219 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-8220 [ HIVE-8220 ]
          Chao Sun made changes -
          Link This issue contains HIVE-8233 [ HIVE-8233 ]
          Chao Sun made changes -
          Link This issue contains HIVE-8242 [ HIVE-8242 ]
          Chao Sun made changes -
          Link This issue contains HIVE-8249 [ HIVE-8249 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-8274 [ HIVE-8274 ]
          Rui Li made changes -
          Link This issue incorporates HIVE-8300 [ HIVE-8300 ]
          Chao Sun made changes -
          Link This issue contains HIVE-8430 [ HIVE-8430 ]
          Chao Sun made changes -
          Link This issue contains HIVE-8431 [ HIVE-8431 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-8426 [ HIVE-8426 ]
          Chao Sun made changes -
          Link This issue contains HIVE-8463 [ HIVE-8463 ]
          Chao Sun made changes -
          Link This issue contains HIVE-8496 [ HIVE-8496 ]
          Chao Sun made changes -
          Link This issue contains HIVE-8533 [ HIVE-8533 ]
          Rui Li made changes -
          Link This issue incorporates HIVE-8537 [ HIVE-8537 ]
          Chao Sun made changes -
          Link This issue contains HIVE-8542 [ HIVE-8542 ]
          Chao Sun made changes -
          Link This issue contains HIVE-8545 [ HIVE-8545 ]
          Xuefu Zhang made changes -
          Link This issue incorporates HIVE-8699 [ HIVE-8699 ]
          Xuefu Zhang made changes -
          Link This issue depends upon SPARK-4290 [ SPARK-4290 ]
          Chengxiang Li made changes -
          Link This issue incorporates HIVE-8548 [ HIVE-8548 ]
          Bing Li made changes -
          Assignee Xuefu Zhang [ xuefuz ] Bing Li [ libing ]
          Xuefu Zhang made changes -
          Assignee Bing Li [ libing ] Xuefu Zhang [ xuefuz ]
          Brock Noland made changes -
          Link This issue is related to HIVE-9134 [ HIVE-9134 ]
          Brock Noland made changes -
          Link This issue is related to HIVE-9367 [ HIVE-9367 ]
          Lefty Leverenz made changes -
          Link This issue is related to HIVE-9611 [ HIVE-9611 ]
          Xuefu Zhang made changes -
          Comment [ Yes. It's available in both 1.1 and 1.2. ]
          Xuefu Zhang made changes -
          Comment [ Yes. It's available in both 1.1 and 1.2. ]
          Xuefu Zhang made changes -
          Comment [ Yes. It's available in both 1.1 and 1.2. ]
          dutianmin made changes -
          Assignee Xuefu Zhang [ xuefuz ] dutianmin [ dutianmin ]
          Carl Steinbach made changes -
          Assignee dutianmin [ dutianmin ] Xuefu Zhang [ xuefuz ]
          Xuefu Zhang made changes -
          Comment [ Is the branch already usable in production? ]
          Xuefu Zhang made changes -
          Comment [ Is the branch already usable in production? ]
          Xuefu Zhang made changes -
          Comment [ Thanks Xuefu for the quick reply. I will give it a try next week. ]
          Li Mingyang made changes -
          Assignee Xuefu Zhang [ xuefuz ] Li Mingyang [ limyiter ]

            People

            • Assignee:
              Li Mingyang
              Reporter:
              Xuefu Zhang
            • Votes:
              34 Vote for this issue
              Watchers:
              183 Start watching this issue

              Dates

              • Created:
                Updated:

                Development