Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8215

Multi-table insertion optimization #3: use 1+1 tasks instead of 1+N tasks [Spark Branch]

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Spark
    • Labels:
      None

      Description

      Currently, for multi-table insertion it generates 1+N tasks - "1" is the task that generates input, and "N" are the insert queries that read from the input and write to separate output tables.

      In order to make these N tasks run in parallel, we rely on hive.exec.parallel to be set to true. In this patch, we propose an alternative approach, which is to combine these N tasks into one single task, which contains N separate operator trees, which in execution leads to N result RDDs. We then may be able to execute these N RDDs in parallel inside Spark, without needing hive.exec.parallel.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                csun Chao Sun
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: