Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1682

Wrong results with MAPJOIN when cols from non-MAPJOINed table are selected

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 0.7.0
    • None
    • None
    • None
    • Hive trunk (rev 1003407)
      Hadoop 20.2

    Description

      Results of this query is wrong:

      set hive.mapjoin.cache.numrows=100;
      select /*+ MAPJOIN(invites) */ pokes.bar from pokes join invites on (pokes.bar = invites.bar);

      Results of all the queries below match:

      /* This is the same as problematic query without specifying numrows - which defaults to 25k much greater than the number of rows in pokes table */
      select /*+ MAPJOIN(invites) */ pokes.bar from pokes join invites on (pokes.bar = invites.bar)

      set hive.mapjoin.cache.numrows=100;
      select /*+ MAPJOIN(invites) */ invites.bar from pokes join invites on (pokes.bar = invites.bar);

      select invites.bar from pokes join invites on (pokes.bar = invites.bar);

      select pokes.bar from pokes join invites on (pokes.bar = invites.bar);

      Attachments

        Activity

          People

            Unassigned Unassigned
            thiruvel Thiruvel Thirumoolan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: