Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24207

LimitOperator can leverage ObjectCache to bail out quickly

    XMLWordPrintableJSON

Details

    Description

      select  ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk limit 100;
      
       select distinct ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk limit 100;
      
       

      Queries like the above generate a large number of map tasks. Currently they don't bail out after generating enough amount of data.

      It would be good to make use of ObjectCache & retain the number of records generated. LimitOperator/VectorLimitOperator can bail out for the later tasks in the operator's init phase itself.

      https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57

      https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58

      Attachments

        Issue Links

          Activity

            People

              abstractdog László Bodor
              rajesh.balamohan Rajesh Balamohan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h