Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1721

use bloom filters to improve the performance of joins

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      In case of map-joins, it is likely that the big table will not find many matching rows from the small table.
      Currently, we perform a hash-map lookup for every row in the big table, which can be pretty expensive.

      It might be useful to try out a bloom-filter containing all the elements in the small table.
      Each element from the big table is first searched in the bloom filter, and only in case of a positive match,
      the small table hash table is explored.

      Attachments

        1. hive-1721.patch.txt
          13 kB
          Siddhartha Gunda

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            namit Namit Jain

            Dates

              Created:
              Updated:

              Slack

                Issue deployment