Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24384 SharedWorkOptimizer improvements
  3. HIVE-24376

SharedWorkOptimizer may retain the SJ filter condition during RemoveSemijoin mode

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      the mode name is also a bit confusing..but here is what happens:

      TS[A1] -> ...
      TS[A2] -> JOIN
      TS[B] -> JOIN
      

      we have an SJ edge between TS[B] -> TS[A2] to communicate informations about the join keys; lets assume the reducation ratio was r.

      RemoveSemijoin right now does the following:

      • removes the semijoin edge (so TS[A2] will become a full scan)
      • merges TS[A1] and TS[A2]

      w.r.t to read data from disk: this is great - we accessed A twice; from which 1 was a full scan - and now we only read it once.

      but from row traffic perspective: TS[A2] emits more rows from now on because we dont have the r ratio semijoin reduction anymore.

      Attachments

        Issue Links

          Activity

            People

              kgyrtkirk Zoltan Haindrich
              kgyrtkirk Zoltan Haindrich
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: