Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7292 Hive on Spark
  3. HIVE-7503

Support Hive's multi-table insert query with Spark [Spark Branch]

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.0
    • Spark

    Description

      For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently.

      It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer.

      This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this.

      Attachments

        1. HIVE-7503.9-spark.patch
          61 kB
          Chao Sun
        2. HIVE-7503.8-spark.patch
          61 kB
          Chao Sun
        3. HIVE-7503.7-spark.patch
          33 kB
          Chao Sun
        4. HIVE-7503.6-spark.patch
          34 kB
          Chao Sun
        5. HIVE-7503.5-spark.patch
          27 kB
          Chao Sun
        6. HIVE-7503.4-spark.patch
          27 kB
          Chao Sun
        7. HIVE-7503.3-spark.patch
          25 kB
          Chao Sun
        8. HIVE-7503.2-spark.patch
          26 kB
          Chao Sun
        9. HIVE-7503.1-spark.patch
          24 kB
          Chao Sun

        Issue Links

          Activity

            People

              csun Chao Sun
              xuefuz Xuefu Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: