Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10360

Allow a simple limit to be treated as a sampling hint where applicable

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 3.4.0
    • Impala 4.0.0
    • Frontend
    • None
    • ghx-label-12

    Description

      As a follow-up to IMPALA-10314, it is sometimes useful to consider a simple limit as a way to sample from a table if a relevant hint has been provided. This is especially useful if the query is against a view because a TABLESAMPLE clause is only supported for base tables, not views. Here's an example that illustrates the motivation:

      set optimize_simple_limit = true;
      with v1 as 
      (select * from fact_table /* +some_hint_for_table */
        where col in (select col from dim_table where ...)) 
      select * from v1 limit 10;
      

      In this case, the outer query just wants any 10 rows that satisfy the WHERE predicate in v1 and if we can specify the hint for the large fact_table to treat the simple limit as a hint for sampling, it would substantially reduce the query planning time without significantly compromising on the correctness. Without such optimization, during planning the scan ranges will be computed for the entire fact_table which is expensive.

      Also, note that doing the naive push down of limit to the fact table is not advisable because then the planner may decide (under the optimize_simple_limit=true setting) to only look at first few partitions or files within a partition and those rows may not satisfy the join condition. The sampling will be spread out more uniformly across the partitions, so the chances of producing sufficient qualifying rows is much higher.

      Attachments

        Issue Links

          Activity

            People

              amansinha Aman Sinha
              amansinha Aman Sinha
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: