Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7659

Unnecessary sort in query plan [Spark Branch]

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.0
    • Spark
    • None

    Description

      For hive on spark.
      Currently we rely on the sort order in RS to decide whether we need a sortByKey transformation. However a simple group by query will also have the sort order set to '+'.
      Consider the query: select key from table group by key. The RS in the map work will have sort order set to '+', thus requiring a sortByKey shuffle.

      To avoid the unnecessary sort, we should either use another way to decide if there has to be a sort shuffle, or we should set the sort order only when sort is really needed.

      Attachments

        1. HIVE-7659.2-spark.patch
          3 kB
          Rui Li
        2. HIVE-7659-spark.patch
          2 kB
          Rui Li

        Issue Links

          Activity

            People

              lirui Rui Li
              lirui Rui Li
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: