Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24252

Improve decision model for using semijoin reducers

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      After a few experiments with TPC-DS 10TB dataset, we observed that in some cases semijoin reducers were not effective; they didn't reduce the number of records or they reduced the relation only a tiny bit.

      In some cases we can make the semijoin reducer more effective by adding more columns but this requires also a bigger bloom filter so the decision for the number of columns to include in the bloom becomes more delicate.

      The current decision model always chooses multi-column semijoin reducers if they are available but this may not always beneficial if the a single column can reduce significantly the target relation.

      Attachments

        Activity

          People

            zabetak Stamatis Zampetakis
            zabetak Stamatis Zampetakis
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: