As a follow-up to
IMPALA-10314, it is sometimes useful to consider a simple limit as a way to sample from a table if a relevant hint has been provided. This is especially useful if the query is against a view because a TABLESAMPLE clause is only supported for base tables, not views. Here's an example that illustrates the motivation:
In this case, the outer query just wants any 10 rows that satisfy the WHERE predicate in v1 and if we can specify the hint for the large fact_table to treat the simple limit as a hint for sampling, it would substantially reduce the query planning time without significantly compromising on the correctness. Without such optimization, during planning the scan ranges will be computed for the entire fact_table which is expensive.
Also, note that doing the naive push down of limit to the fact table is not advisable because then the planner may decide (under the optimize_simple_limit=true setting) to only look at first few partitions or files within a partition and those rows may not satisfy the join condition. The sampling will be spread out more uniformly across the partitions, so the chances of producing sufficient qualifying rows is much higher.