Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25856

Intermittent null ordering in plans of queries with GROUP BY and LIMIT

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      CREATE TABLE person (id INTEGER, country STRING);
      EXPLAIN CBO SELECT country, count(1) FROM person GROUP BY country LIMIT 5;
      

      The EXPLAIN query produces a slightly different plan (ordering of nulls) from one execution to another.

      CBO PLAN:
      HiveSortLimit(sort0=[$1], dir0=[ASC-nulls-first], fetch=[5])
        HiveProject(country=[$0], $f1=[$1])
          HiveAggregate(group=[{1}], agg#0=[count()])
            HiveTableScan(table=[[default, person]], table:alias=[person])
      
      CBO PLAN:
      HiveSortLimit(sort0=[$1], dir0=[ASC], fetch=[5])
        HiveProject(country=[$0], $f1=[$1])
          HiveAggregate(group=[{1}], agg#0=[count()])
            HiveTableScan(table=[[default, person]], table:alias=[person])
      

      This is unlikely to cause wrong results cause most aggregate functions (not all) do not return nulls thus null ordering doesn't matter much but it can lead to other problems such as:

      • intermittent CI failures
      • query/plan caching

      I bumped into this problem after investigating test failures in CI. The following query in offset_limit_ppd_optimizer.q returns different plan when it runs individually and when it runs along with some other qtest files.

      explain
      select * from
      (select key, count(1) from src group by key order by key limit 10,20) subq
      join
      (select key, count(1) from src group by key limit 20,20) subq2
      on subq.key=subq2.key limit 3,5;
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            zabetak Stamatis Zampetakis Assign to me
            zabetak Stamatis Zampetakis
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 1h
              1h

              Slack

                Issue deployment