Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16414

[Hive on Tez] Hive Union queries resource efficiency less on Tez than Mapreduce

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.1.0
    • None
    • Tez
    • None

    Description

      When a hive union query with the sub queries reading the same table is run in Mapreduce and tez, Mapreduce reads the table only once, no matter how many reads on the same table are present,
      but tez reads the same table multiple times in the form of multiple vertices.

      If a table is to be read by X mappers,
      Tez runs with kX map tasks where k is the number of sub queries reading from the same table and
      Mapreduce runs with X mappers no matter how many sub queries are present.

      For such union queries, we need to fall back to MR instead of TEZ.

      Query:
      http://pastebin.com/t6n91u6a

      Tez explain plan:
      http://pastebin.com/aWwVxhii

      MR explain plan:
      http://pastebin.com/iDbWwtKR

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              raviorteja Ravi Teja Chilukuri
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: