[PHOENIX-5148] Improve OrderPreservingTracker to optimize OrderBy/GroupBy for ClientScanPlan and ClientAggregatePlan - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 4.14.1
Fix Version/s: 4.15.0, 5.1.0
Component/s: None
Labels:
None

Description

Given a table

  create table test ( 
       pk1 varchar not null , 
       pk2 varchar not null, 
       pk3 varchar not null,
        v1 varchar, 
        v2 varchar, 
        CONSTRAINT TEST_PK PRIMARY KEY ( 
              pk1,
              pk2,
              pk3 ))

Consider following three cases:

1. OrderBy of ClientScanPlan

for sql:

select v1 from (select v1,v2,pk3 from test t where pk1 = '6' order by t.v2,t.pk3,t.v1 limit 10) a order by v2,pk3

Obviously, the outer OrderBy "order by v2,pk3" should be compiled out because it matchs the inner query OrderBy "order by t.v2,t.pk3,t.v1" , but unfortunately it is not compiled out.

2. GroupBy of ClientAggregatePlan

for sql :

select v1 from (select v1,pk2,pk1 from test t where pk1 = '6' order by t.pk2,t.v1,t.pk1 limit 10) a group by pk2,v1

Obviously, the outer GroupBy "group by pk2,v1" should be orderPreserving because it matchs the inner query OrderBy "order by t.pk2,t.v1,t.pk1" , but unfortunately the isOrderPreserving() of outer GroupBy return false.

3. OrderBy of SortMergeJoinPlan(from ~~PHOENIX-4618~~)

for sql:

SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b

The result of the sort-merge-join is sorted on (T1.a, T1.b) and (T2.a, T2.b) at the same time.
Thus, both 1)

SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T1.a, T1.b

and 2)

SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T2.a, T2.b

should avoid doing an extra order-by after the sort-merge-join operation.

All the above three cases are caused by the same problem that the OrderPreservingTracker relies solely on row keys for inferring alignment between the target Expression and the source sortedness.

For following QueryCompiler.compileSingleQuery, because the inner query of above cases has OrderBy, so in line 534, local variable isInRowKeyOrder is false:

518   protected QueryPlan compileSingleQuery(StatementContext context, SelectStatement select, List<Object> binds, boolean asSubquery, boolean allowPageFilter) throws SQLException{
519           SelectStatement innerSelect = select.getInnerSelectStatement();
520           if (innerSelect == null) {
521                return compileSingleFlatQuery(context, select, binds, asSubquery, allowPageFilter, null, null, true);
522           }
523  
524           QueryPlan innerPlan = compileSubquery(innerSelect, false);
525           TupleProjector tupleProjector = new TupleProjector(innerPlan.getProjector());
526           innerPlan = new TupleProjectionPlan(innerPlan, tupleProjector, null);
527
528           // Replace the original resolver and table with those having compiled type info.
529           TableRef tableRef = context.getResolver().getTables().get(0);
530            ColumnResolver resolver = FromCompiler.getResolverForCompiledDerivedTable(statement.getConnection(), tableRef, innerPlan.getProjector());
531            context.setResolver(resolver);
532           tableRef = resolver.getTables().get(0);
533           context.setCurrentTable(tableRef);
534           boolean isInRowKeyOrder = innerPlan.getGroupBy() == GroupBy.EMPTY_GROUP_BY && innerPlan.getOrderBy() == OrderBy.EMPTY_ORDER_BY;
535
536           return compileSingleFlatQuery(context, select, binds, asSubquery, allowPageFilter, innerPlan, tupleProjector, isInRowKeyOrder);
537       }

Because the OrderPreservingTracker relies solely on row keys, so in following line 170 of OrderByCompiler.compile , when isInRowKeyOrder is false, the OrderPreservingTracker.isOrderPreserving() checking is skipped:

169     // If we're ordering by the order returned by the scan, we don't need an order by
170      if (isInRowKeyOrder && tracker.isOrderPreserving()) {
171             if (tracker.isReverse()) {
172                  // Don't use reverse scan if:
173                  // 1) we're using a skip scan, as our skip scan doesn't support this yet.
174                  // 2) we have the FORWARD_SCAN hint set to choose to keep loading of column
175                  //    families on demand versus doing a reverse scan
176                  // REV_ROW_KEY_ORDER_BY scan would not take effect for a projected table, so don't return it for such table types.
177                if (context.getConnection().getQueryServices().getProps().getBoolean(QueryServices.USE_REVERSE_SCAN_ATTRIB, QueryServicesOptions.DEFAULT_USE_REVERSE_SCAN)
178                        && !context.getScanRanges().useSkipScanFilter()
179                        && context.getCurrentTable().getTable().getType() != PTableType.PROJECTED
180                        && context.getCurrentTable().getTable().getType() != PTableType.SUBQUERY
181                        && !statement.getHint().hasHint(Hint.FORWARD_SCAN)) {
182                    return OrderBy.REV_ROW_KEY_ORDER_BY;
183               }
184            } else {
185                 return OrderBy.FWD_ROW_KEY_ORDER_BY;
186           }
187        }

What we can do to fix this problem are :

refactor the OrderPreservingTracker to make it inferring alignment between the target Expression and the source sortedness based on OrderByExpression , not the row keys, so we can remove the variable isInRowKeyOrder.
add a new method getActualOutputOrderBys to the QueryPlan to get the output OrderBys of the query result , eg. for the above third SortMergeJoinPlan example ,getActualOutputOrderBys returns (T1.a, T1.b) and (T2.a, T2.b) .

Besides above three cases, a implicit semantic optimization could be implemented also:
for following sql , even the literal OrderBy of inner query is "order by t.pk1,t.pk2" . By getActualOutputOrderBys of inner QueryPlan, we can infer that the actual OrderBy of inner query is "order by t.pk1,t.pk2,t.pk3" , so the outer OrderBy "order by pk1,pk2,pk3" could be compiled out:

select pk1 from (select pk3,pk2,pk1 from test t where v1 = '6' order by t.pk1,t.pk2 limit 10) a where pk3 > '8' order by pk1,pk2,pk3

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PHOENIX-5148_v3-4.x-HBase-1.2.patch
18/Mar/19 12:57
188 kB
chenglei
PHOENIX-5148_v3-master.patch
18/Mar/19 03:51
188 kB
chenglei
PHOENIX-5148_v3-4.x-HBase-1.4.patch
18/Mar/19 03:51
188 kB
chenglei
PHOENIX-5148_v2-4.x-HBase-1.4.patch
17/Mar/19 06:52
185 kB
chenglei
PHOENIX-5148-4.x-HBase-1.4.patch
21/Feb/19 12:55
178 kB
chenglei

Issue Links

contains

PHOENIX-4618 Optimize out order-by on sort-merge-join when possible

Closed

is related to

PHOENIX-7352 Improve OrderPreservingTracker to support extracting partial ordering columns for TupleProjectionPlan

Resolved

links to

GitHub Pull Request #444

GitHub Pull Request #468

Activity

People

Assignee:: chenglei

Reporter:: chenglei

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Feb/19 14:51

Updated:: 03/Sep/24 06:57

Resolved:: 18/Mar/19 15:01

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

9h 40m