Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8975

Possible performance regression on bucket_map_join_tez2.q

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.15.0
    • 1.1.0
    • None

    Description

      After introducing the identity project removal optimization in HIVE-8435, plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In particular, earlier it was doing a map-join and after HIVE-8435 it changed to a reduce-join.

      The query is the following one:

      select a.key, b.key from (select distinct key from tab) a join tab b on b.key = a.key
      

      The plan before removing the projections is:

      TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13]
      TS[6]-FIL[17]-RS[10]-JOIN[11]
      

      And after removing identity projections:

      TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13]
      TS[6]-FIL[17]-RS[10]-JOIN[11]
      

      After digging a bit, I realized it is not converting the reduce-join into a map-join because stats for GBY[4] change if SEL[5] is removed; thus the optimization does not kick in.
      The reason for the stats change in the GroupBy operator is in this line, where it is checked whether the GBY is immediately followed by a RS operator or not, and calculate stats differently depending on it.

      Attachments

        1. HIVE-8975.3.patch
          308 kB
          Prasanth Jayachandran
        2. HIVE-8975.2.patch
          5 kB
          Prasanth Jayachandran
        3. HIVE-8975.1.patch
          1 kB
          Prasanth Jayachandran

        Issue Links

          Activity

            People

              prasanth_j Prasanth Jayachandran
              jcamacho Jesús Camacho Rodríguez
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: