Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9618

Deduplicate RS keys for ptf/windowing

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • None
    • 1.2.0
    • PTF-Windowing
    • None

    Description

      Currently, partition spec containing same column for partition-by and order-by makes duplicated key column for RS. For example,

      explain
      select p_mfgr, p_name, p_size, 
      rank() over (partition by p_mfgr order by p_name) as r, 
      dense_rank() over (partition by p_mfgr order by p_name) as dr, 
      sum(p_retailprice) over (partition by p_mfgr order by p_name rows between unbounded preceding and current row)  as s1
      from noop(on noopwithmap(on noop(on part 
      partition by p_mfgr 
      order by p_mfgr, p_name
      )))
      

      "partition by p_mfgr order by p_mfgr, p_name" makes duplicated key columns like below

      Reduce Output Operator
          key expressions: p_mfgr (type: string), p_mfgr (type: string), p_name (type: string)
          sort order: +++
          Map-reduce partition columns: p_mfgr (type: string)
          value expressions: p_size (type: int), p_retailprice (type: double)
      

      Attachments

        1. HIVE-9618.3.patch.txt
          164 kB
          Navis Ryu
        2. HIVE-9618.2.patch.txt
          163 kB
          Navis Ryu
        3. HIVE-9618.1.patch.txt
          58 kB
          Navis Ryu

        Activity

          People

            navis Navis Ryu
            navis Navis Ryu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: