Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8215

Multi-table insertion optimization #3: use 1+1 tasks instead of 1+N tasks [Spark Branch]

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • Spark
    • None

    Description

      Currently, for multi-table insertion it generates 1+N tasks - "1" is the task that generates input, and "N" are the insert queries that read from the input and write to separate output tables.

      In order to make these N tasks run in parallel, we rely on hive.exec.parallel to be set to true. In this patch, we propose an alternative approach, which is to combine these N tasks into one single task, which contains N separate operator trees, which in execution leads to N result RDDs. We then may be able to execute these N RDDs in parallel inside Spark, without needing hive.exec.parallel.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              csun Chao Sun
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: