Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2677

Joins involving where statements with a partition do not prune correctly in left outer join

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.7.0
    • None
    • Query Processor

    Description

      The following query:

      select t1.id as m_id, count(t2.id) as totals 
      from table1 t1 left outer join table2 t2 on (t1.id = t2.id) 
      where (datepartition ==  '2011-11-08-00') 
      group by t1.id;

      should prune to just a single partition (datepartition == '2011-11-08-00') . However, the filter is being applied in the reducer, so a full table scan is being done as part of the map.

      One can get the correct behavior by pushing the filter into a select statement within the join itself:

      select t1.id as m_id, count(p.id) as totals 
      from table1 t1 left outer join (select * from table2 where datepartition ==  '2011-11-08-00') t2 on (t1.id = t2.id)  
      group by t1.id;

      Attachments

        Activity

          People

            Unassigned Unassigned
            jghoman Jakob Homan
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: