Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5710

Merging of Join Trees assumes all filters are on the merging table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      The following query fails with a SemanticException

      select p1.name, p2.name, p3.name
      from part p1 join p2 on p1.name = p2.name
      join part p3 on p1.name = p3.name and p2.key > 10
      

      The Merge Join logic associates the p2.key > 10 filter with the merging table i.e 'p1'. When constructing the Join Plan an attempt is made to resolve this predicate against p1's RowResolver which causes the SemanticException.

      The underlying issue is that during runtime filters are applied on the input rows to the Join Operator. There is no way to apply a filter on intermediate data. In the above query we shouldn't apply p2.key >10 predicate directly on p2, but on the output of p1 join p2.

      The following is also a valid query, here the predicate refers to multiple left tables:

      select p1.name, p2.name, p3.name
      from part p1 join p2 on p1.name = p2.name
      join part p3 on p1.name = p3.name and p2.key > p1.key
      

      As a start, propose to prevent merging when there is a Filter that refers to a non-merging table.

      Attachments

        Activity

          People

            rhbutani Harish Butani
            rhbutani Harish Butani
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: