Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3733

Improve Hive's logic for conditional merge

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      If the config hive.merge.mapfiles is set to true and hive.merge.mapredfiles is set to false then when hive encounters a FileSinkOperator when generating map reduce tasks, it will look at the entire job to see if it has a reducer, if it does it will not merge. Instead it should be check if the FileSinkOperator is a child of the reducer. This means that outputs generated in the mapper will be merged, and outputs generated in the reducer will not be, the intended effect of setting those configs.

      Simple repro:

      set hive.merge.mapfiles=true;
      set hive.merge.mapredfiles=false;
      EXPLAIN
      FROM <input_table>
      INSERT OVERWRITE TABLE <output_table1> SELECT key, COUNT group by key
      INSERT OVERWRITE TABLE <output_table2> SELECT *;

      The output should contain a Conditional Operator, Mapred Stages, and Move tasks

      Attachments

        1. HIVE-3733.1.patch.txt
          11 kB
          Pradeep Kamath
        2. HIVE-3733.3.patch.txt
          16 kB
          Pradeep Kamath
        3. HIVE-3733.4.patch.txt
          14 kB
          Pradeep Kamath
        4. HIVE-3733.5.patch.txt
          30 kB
          Pradeep Kamath
        5. HIVE-3733.optimizer.patch.txt
          30 kB
          Pradeep Kamath

        Activity

          People

            zhenxiao Zhenxiao Luo
            pkamath Pradeep Kamath
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: