Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8518

Compile time skew join optimization returns duplicated results

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.14.0
    • 1.0.2, 1.1.0
    • Logical Optimizer
    • None

    Description

      Compile time skew join optimization clones the join operator tree and unions the results.
      The problem here is that we don't properly insert the predicate for the cloned join (relying on an assert statement).

      To reproduce the issue, run the simple query:

      select * from tbl1 join tbl2 on tbl1.key=tbl2.key;

      And suppose there's some skew in tbl1 (specify skew with CREATE or ALTER statement).
      Duplicated results will be returned if you set hive.optimize.skewjoin.compiletime=true.

      Attachments

        1. HIVE-8518.1.patch
          1 kB
          Rui Li

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            lirui Rui Li Assign to me
            lirui Rui Li
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment