Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10586

Plans for Queries with Select distinct and Windowing are incorrect

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Thanks to yhuai for pointing this out.

      The Plan generated has the GBy Operator(for the Select Distinct) placed below the PTFOp. One would expect the Select Distinct to happen last. yhuai confirmed this behavior in postgres. I think this paragraph in the SQL spec states this order(though I am not an expert in deciphering the language in the spec; if an expert on the spec wants to pipe in, please do):

      Point h) on Page 222,  in the 2011 SQL Spec, seems to state this:
      
      h)  Case:
      
      i)  If OF is simply contained in a <query specification> QSX, then QSX is equivalent to:
      
      SELECT SQ SLNEW TENEW
      

      Here is an example from windowing.q

      35. testDistinctWithWindowing
      select DISTINCT p_mfgr, p_name, p_size,
      sum(p_size) over w1 as s
      from part
      window w1 as (distribute by p_mfgr sort by p_name rows between 2 preceding and 2 following)
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            rhbutani Harish Butani
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: