Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7659

Unnecessary sort in query plan [Spark Branch]

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: Spark
    • Labels:
      None

      Description

      For hive on spark.
      Currently we rely on the sort order in RS to decide whether we need a sortByKey transformation. However a simple group by query will also have the sort order set to '+'.
      Consider the query: select key from table group by key. The RS in the map work will have sort order set to '+', thus requiring a sortByKey shuffle.

      To avoid the unnecessary sort, we should either use another way to decide if there has to be a sort shuffle, or we should set the sort order only when sort is really needed.

        Attachments

        1. HIVE-7659-spark.patch
          2 kB
          Rui Li
        2. HIVE-7659.2-spark.patch
          3 kB
          Rui Li

          Issue Links

            Activity

              People

              • Assignee:
                lirui Rui Li
                Reporter:
                lirui Rui Li
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: