Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20252

Semijoin Reduction : Cycles due to semi join branch may remain undetected if small table side has a map join upstream.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.2.0
    • None
    • None

    Description

      For eg,

       

       # 2018-07-26T17:22:14,664 DEBUG [51377701-dc98-424f-82e0-bbb5d6c84316 main] optimizer.SharedWorkOptimizer: Before SharedWorkOptimizer:
       # TS[0]-FIL[96]-SEL[2]-MAPJOIN[156]-MAPJOIN[157]-MAPJOIN[161]-MAPJOIN[162]-FIL[47]-SEL[48]-MAPJOIN[163]-FIL[66]-SEL[67]-TNK[105]-GBY[68]-RS[69]-GBY[70]-SEL[71]-RS[72]-SEL[73]-LIM[74]-FS[75]
       #                                                           -SEL[142]-GBY[143]-RS[144]-GBY[145]-RS[155]
       # TS[3]-FIL[97]-SEL[5]-RS[34]-MAPJOIN[156]
       # TS[6]-FIL[98]-SEL[8]-RS[37]-MAPJOIN[157]
       # TS[9]-FIL[99]-SEL[11]-MAPJOIN[158]-GBY[40]-RS[42]-MAPJOIN[161]
       # TS[12]-FIL[100]-SEL[14]-RS[16]-MAPJOIN[158]
       #                       -SEL[131]-GBY[132]-EVENT[133]
       # TS[19]-FIL[101]-SEL[21]-MAPJOIN[159]-GBY[29]-RS[30]-GBY[31]-SEL[32]-RS[45]-MAPJOIN[162]
       # TS[22]-FIL[102]-SEL[24]-RS[26]-MAPJOIN[159]
       #                       -SEL[139]-GBY[140]-EVENT[141]
       # TS[49]-FIL[103]-SEL[51]-MAPJOIN[160]-GBY[59]-RS[60]-GBY[61]-SEL[62]-RS[64]-MAPJOIN[163]
       # TS[52]-FIL[104]-SEL[54]-RS[56]-MAPJOIN[160]
       #                       -SEL[147]-GBY[148]-EVENT[149]
       # 
       # 
       # DPP information stored in the cache: \{TS[19]=[EVENT[141]], TS[9]=[EVENT[133]], TS[49]=[RS[155], EVENT[149]]}
      

       

      The semi join branch in line 3 feeds into TS[49] in line 12 which feeds to MAPJOIN[163] going back to parent of the semi join branch at line 2.

      The logic to detect cycle may fail as there is a MAPJOIN[160] at line 12 which could cause the logic to look for wrong TS. The logic to find TS operator upstream must use findOperatorsUpstream() and examine each TS Op for complete coverage.

       Simplified image of task-cycle, without operator cycles - http://people.apache.org/~gopalv/HIVE_20252_cycle1.svg

      And the artificial edge introduced to trigger cycle detection (in red) - http://people.apache.org/~gopalv/HIVE_20252_cycle_fix.svg

      cc jcamachorodriguez

      Attachments

        1. HIVE-20252.01-branch-3.patch
          20 kB
          Deepak Jaiswal
        2. HIVE-20252.1.patch
          8 kB
          Deepak Jaiswal
        3. HIVE-20252.2.patch
          20 kB
          Deepak Jaiswal
        4. HIVE-20252.3.patch
          19 kB
          Deepak Jaiswal
        5. HIVE-20252.4.patch
          20 kB
          Deepak Jaiswal
        6. HIVE-20252.5.patch
          20 kB
          Deepak Jaiswal
        7. HIVE-20252.6.patch
          20 kB
          Deepak Jaiswal

        Issue Links

          Activity

            People

              djaiswal Deepak Jaiswal
              djaiswal Deepak Jaiswal
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: