Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6883

Dynamic partitioning optimization does not honor sort order or order by

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.13.0
    • 0.13.1, 0.14.0
    • None
    • None

    Description

      HIVE-6455 patch does not honor sort order of the output table or order by of select statement. The reason for the former is numDistributionKey in ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, because of this RSOp sets the sort columns to null in Key. Since nulls are set in place of sort columns in Key, the sort columns in Value are not sorted.

      The other issue is ORDER BY columns are not honored during insertion. For example

      insert overwrite table over1k_part_orc partition(ds="foo", t) select si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
      

      the select query performs order by on column 'si' in the first MR job. The following MR job (inserted by HIVE-6455), sorts the input data on dynamic partition column 't' without taking into account the already sorted 'si' column. This results in out of order insertion for 'si' column.

      Attachments

        1. HIVE-6883.1.patch
          1.46 MB
          Prasanth Jayachandran
        2. HIVE-6883.2.patch
          1.48 MB
          Prasanth Jayachandran
        3. HIVE-6883.3.patch
          1.48 MB
          Prasanth Jayachandran
        4. HIVE-6883-branch-0.13.3.patch
          1.79 MB
          Prasanth Jayachandran

        Issue Links

          Activity

            People

              prasanth_j Prasanth Jayachandran
              prasanth_j Prasanth Jayachandran
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: