Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13750

Avoid additional shuffle stage created by Sorted Dynamic Partition Optimizer when possible

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.1.0
    • Physical Optimizer
    • None

    Description

      Extend ReduceDedup to remove additional shuffle stage created by sorted dynamic partition optimizer when possible, thus avoiding unnecessary work.

      By ashutoshc:

      Currently, if config is on Sorted Dynamic Partition Optimizer (SDPO) unconditionally adds an extra shuffle stage. If sort columns of previous shuffle and partitioning columns of table match, reduce sink deduplication optimizer removes extra shuffle stage, thus bringing down overhead to zero. However, if they don’t match, we end up doing extra shuffle. This can be improved since we can add table partition columns as a sort columns on earlier shuffle and avoid this extra shuffle. This ensures that in cases query already has a shuffle stage, we are not shuffling data again.

      Attachments

        1. HIVE-13750.02.patch
          47 kB
          jcamachorodriguez
        2. HIVE-13750.01.patch
          37 kB
          jcamachorodriguez
        3. HIVE-13750.patch
          37 kB
          jcamachorodriguez
        4. HIVE-13750.patch
          37 kB
          jcamachorodriguez

        Issue Links

          Activity

            People

              jcamacho Jesús Camacho Rodríguez
              jcamacho Jesús Camacho Rodríguez
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: