Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4741

ORDER BY behavior with UNION is incorrect

    XMLWordPrintableJSON

Details

    Description

      When a query uses the UNION, EXCEPT, or INTERSECT operators, the ORDER BY clause must be specified at the end of the statement and the results of the combined queries are sorted. ORDER BY clauses are not allowed in individual branches unless the branch is enclosed by parentheses.

      There are two bugs currently:

      1. An ORDER BY is allowed in a branch of a UNION that is not enclosed in parentheses
      2. The final ORDER BY of a UNION is attached to the nearest branch when it should be sorting the combined results of the UNION(s)

      For example, this is not valid syntax but is allowed in Impala

      select * from t1 order by 1
      union all
      select * from t2
      

      And for queries like this, the ORDER BY should order the unioned result, not just the nearest branch which is the current behavior.

      select * from t1
      union all
      select * from t2
      order by 1
      

      If one wants ordering within a branch, the query block must be enclosed by parentheses like such:

      (select * from t1 order by 1)
      union all
      (select * from t2 order by 2)
      

      Here is an example where incorrect results are returned.
      Impala

      [impalad:21000] > select r_regionkey, r_name from region union all select r_regionkey, r_name from region order by 1 limit 2;
      +-------------+-------------+
      | r_regionkey | r_name      |
      +-------------+-------------+
      | 0           | AFRICA      |
      | 1           | AMERICA     |
      | 2           | ASIA        |
      | 3           | EUROPE      |
      | 4           | MIDDLE EAST |
      | 0           | AFRICA      |
      | 1           | AMERICA     |
      +-------------+-------------+
      Fetched 7 row(s) in 0.12s
      

      PostgreSQL

      tpch=# select r_regionkey, r_name from region union all select r_regionkey, r_name from region order by 1 limit 2;
       r_regionkey |          r_name
      -------------+---------------------------
                 0 | AFRICA
                 0 | AFRICA
      (2 rows) 
      

      see also https://cloud.google.com/spanner/docs/query-syntax#syntax_5

      Attachments

        1. query36a.sql
          2 kB
          Tim Wood
        2. query49.sql
          4 kB
          Tim Wood

        Issue Links

          Activity

            People

              Unassigned Unassigned
              grahn Greg Rahn
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: