Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24252

Improve decision model for using semijoin reducers

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      After a few experiments with TPC-DS 10TB dataset, we observed that in some cases semijoin reducers were not effective; they didn't reduce the number of records or they reduced the relation only a tiny bit.

      In some cases we can make the semijoin reducer more effective by adding more columns but this requires also a bigger bloom filter so the decision for the number of columns to include in the bloom becomes more delicate.

      The current decision model always chooses multi-column semijoin reducers if they are available but this may not always beneficial if the a single column can reduce significantly the target relation.

        Attachments

          Activity

            People

            • Assignee:
              zabetak Stamatis Zampetakis
              Reporter:
              zabetak Stamatis Zampetakis
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: