Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8220

Refactor multi-insert code such that plan splitting and task generation are modular and reusable [Spark Branch]

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • Spark

    Description

      This is a followup for HIVE-7053. Currently the code to split the operator tree and to generate tasks is mingled and thus hard to understand and maintain. Logically the two seems independent. This can be improved by modulizing both. The following might be helpful:

      @Override
      protected void generateTaskTree(List<Task<? extends Serializable>> rootTasks, ParseContext pCtx,
            List<Task<MoveWork>> mvTask, Set<ReadEntity> inputs, Set<WriteEntity> outputs)
            throws SemanticException {
      // 1. Identify if the plan is for multi-insert and split the plan if necessary
      List<Set<Operator>> operatorSets = multiInsertSplit(...);
      // 2. For each operator set, generate a task.
      for (Set<Operator> topOps : operatorSets) {
        SparkTask task = generateTask(topOps);
        ...
      }
      // 3. wire up the tasks
      ...
      }
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              xuefuz Xuefu Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: