Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1682

Wrong results with MAPJOIN when cols from non-MAPJOINed table are selected

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: 0.7.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      Hive trunk (rev 1003407)
      Hadoop 20.2

      Description

      Results of this query is wrong:

      set hive.mapjoin.cache.numrows=100;
      select /*+ MAPJOIN(invites) */ pokes.bar from pokes join invites on (pokes.bar = invites.bar);

      Results of all the queries below match:

      /* This is the same as problematic query without specifying numrows - which defaults to 25k much greater than the number of rows in pokes table */
      select /*+ MAPJOIN(invites) */ pokes.bar from pokes join invites on (pokes.bar = invites.bar)

      set hive.mapjoin.cache.numrows=100;
      select /*+ MAPJOIN(invites) */ invites.bar from pokes join invites on (pokes.bar = invites.bar);

      select invites.bar from pokes join invites on (pokes.bar = invites.bar);

      select pokes.bar from pokes join invites on (pokes.bar = invites.bar);

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              thiruvel Thiruvel Thirumoolan
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: