Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1723

The result of left semi join is not correct

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Release Note:
      This bug is resolved in Hive-1641

      Description

      In the test case semijoin.q, there is a query:
      select /*+ mapjoin(b) */ a.key from t3 a left semi join t1 b on a.key = b.key sort by a.key;
      I think this query will return a wrong result if table t1 is larger than 25000 different keys

      To be simple, I tried a very similar query:
      select /*+ mapjoin(b) */ a.key from test_semijoin a left semi join test_semijoin b on a.key = b.key sort by a.key;
      The table of test_semijoin is like
      0 0
      1 1
      2 2
      3 3
      4 4
      5 5
      ... ...

      ... ....
      25000 25000
      25001 25001
      ... ....
      ... ....
      25999 25999
      26000 26000

      So we can easily estimate the correct result of this query should be the same keys from table test_semijoin itsel.
      Actually, the result is only part of that: only from 0 to 24544.

      0
      1
      2
      ..
      ..
      24543
      24544

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                liyin Liyin Tang
                Reporter:
                liyin Liyin Tang
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: