Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8220

Refactor multi-insert code such that plan splitting and task generation are modular and reusable [Spark Branch]

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Spark
    • Labels:

      Description

      This is a followup for HIVE-7053. Currently the code to split the operator tree and to generate tasks is mingled and thus hard to understand and maintain. Logically the two seems independent. This can be improved by modulizing both. The following might be helpful:

      @Override
      protected void generateTaskTree(List<Task<? extends Serializable>> rootTasks, ParseContext pCtx,
            List<Task<MoveWork>> mvTask, Set<ReadEntity> inputs, Set<WriteEntity> outputs)
            throws SemanticException {
      // 1. Identify if the plan is for multi-insert and split the plan if necessary
      List<Set<Operator>> operatorSets = multiInsertSplit(...);
      // 2. For each operator set, generate a task.
      for (Set<Operator> topOps : operatorSets) {
        SparkTask task = generateTask(topOps);
        ...
      }
      // 3. wire up the tasks
      ...
      }
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                xuefuz Xuefu Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: