Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14246

Tez: disable auto-reducer parallelism when CUSTOM_EDGE is in place

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Minor
    • Resolution: Unresolved
    • 2.2.0
    • None
    • Tez
    • None

    Description

      The CUSTOM_SIMPLE_EDGE impl has differences between the size constraints of either edge which cannot be represented by the ShuffleVertexManager presently.

      Reducing the width based on the hashtable build side vs the streaming probe side have different consequences since there is no order of runtime between them.

      Until the two parent vertices of the shuffle hash-join are related, this feature causes massive inconsistency of performance across runs.

      For inner & semi joins, the hashtable side should have a higher priority than the streaming side and for left outer joins, the streaming side can over-take the hashtable side, being the more dominant factor in the final row-counts.

      Until such priorities can be bubbled up into ShuffleVertexManager, this feature can be disabled.

      Attachments

        1. HIVE-14246.1.patch
          0.9 kB
          Gopal Vijayaraghavan

        Activity

          People

            gopalv Gopal Vijayaraghavan
            gopalv Gopal Vijayaraghavan
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: